High-Resolution DNA Binding Specificity Analysis of Yeast Transcription Factors

Genome Research (2009). Advanced online publication January 21, 2009, doi:10.1101/gr.090233.108.

Cong Zhu1,*, Kelsey Byers1,*, Rachel Patton McCord1,3,*, Zhenwei Shi2, Michael F. Berger1,3, Daniel Newburger1, Katrina Saulrieta1,4, Zachary Smith1,4, Mita Shah1,6, Mathangi Radhakrishnan1,5, Anthony A. Philippakis1,3,7, Yanhui Hu2, Federico De Masi1, Marcin Pacek2, Andreas Rolfs2, TVS Murthy2, Joshua LaBaer2, Martha L. Bulyk1,3,7,8,†

*These authors contributed equally.

To whom correspondence should be addressed. Email: mlbulyk@receptor.med.harvard.edu.

Summary

Transcription factors (TFs) regulate the expression of genes through sequence-specific interactions with DNA binding sites. However, despite recent progress in identifying in vivo TF binding sites by microarray readout of chromatin immunoprecipitation (ChIP-chip), nearly half of all known yeast TFs are of unknown DNA binding specificities, and many additional predicted TFs remain uncharacterized. To address these gaps in our knowledge of yeast TFs and their cis regulatory sequences, we have determined high-resolution binding profiles for 89 known and predicted yeast TFs, over more than 2.3 million gapped and ungapped 8-bp sequences ("k-mers"). We report 50 new or significantly different direct DNA binding site motifs for yeast DNA binding proteins and motifs for 8 proteins for which only a consensus sequence was previously known; in total, this corresponds to over a 50% increase in the number of yeast DNA binding proteins with experimentally determined DNA binding specificities. Among other novel regulators, we discovered proteins that bind the PAC (Polymerase A and C) motif (GATGAG) and regulate rRNA transcription and processing, core cellular processes that are constituent to ribosome biogenesis. In contrast to earlier data types, these comprehensive k-mer binding data permit us to consider the regulatory potential of genomic sequence at the individual word level. These k-mer data allowed us to re-annotate in vivo TF binding targets as direct or indirect, and to examine TFs' potential effects on gene expression in ~1,700 environmental and cellular conditions. These approaches could be adapted to identify TFs and cis regulatory elements in higher eukaryotes.

PBM data are available at the PBM database

Supplementary Files:

Supplementary Methods. (pdf)

Figures

Figure S1. (pdf)
Schematic overview of this study.
Figure S2. (pdf)
Overview of all clones tested on PBMs in this study.
Figure S3. (pdf)
Hierarchical clustering of PBM data over gapped 8-mers for 32 Zn2Cys6 TFs.
Figure S4. (pdf)
DNA binding site motifs for all TFs in this study.
Figure S5. (pdf)
EMSAs for Yer130c, Pbf1, and Pbf2.
Figure S6. (pdf)
Comparison of PBM k-mer data to ChIP-chip binding and motif data for all TFs.
Figure S7. (pdf)
Re-classification of ChIP-chip target genes according to both conservative and permissive ES thresholds.
Figure S8. (pdf)
PBM-derived function predictions are consistent with prior function information.
Figure S9. (pdf)
CRACR plots for Pbf1,2 expr array data.
Figure S10. (pdf)
Rap1 associates with promoters of its target genes in a CRACR-predicted condition-specific manner.
Figure S11. (pdf)
High resolution version of Figure 5A, with TFs labeled.
Figure S12. (pdf)
Motifs and 8-mer binding profile correlations for all co-regulatory TFs from CRACR-predicted condition specificities.

Tables

Table S1. (xls,pdf)
Listing of all clones tested on PBMs in this study.
Table S2. (xls,pdf)
All motif matches for each of the TFs characterized in this study.
Table S3. (xls)
Experimentally determined DNA binding site motif data available for 173 known or putative yeast TFs.
Table S4. (xls,pdf)
Re-classification of direct and indirect target intergenic regions of TFs previously by ChIP-chip (Harbison et al., 2004; MacIsaac et al., 2006).
Table S5. (xls)
Ranked target genes for each of the 89 TFs.
Table updated Oct 2009 to correct an error in gene-rank assignments
Table S6. (xls,pdf)
All over-represented functional categories of target genes for each TF examined in this study.
Table S7. (xls,pdf)
Support for TFs' potential target genes from expression data on perturbations of genes with which they exhibit genetic interactions.
Table S8. (xls,pdf)
Comparison of Rsc3,30 PBM-derived potential target genes to RSC ChIP-chip data.
Table S9. (xls,pdf)
Over-represented functional categories of genes differentially expressed in Pbf1,2 heat shock expression array data.
Table S10. (xls,pdf)
All significant specific conditions from CRACR analysis for each TF examined in this study.
Table S11. (xls,pdf)
All significant condition categories from CRACR analysis for each TF examined in this study.

Other

Figure 1. (pdf, high-resolution copy)
PBM characterization of S. cerevisiae TF binding specificities.