Diversity and Complexity in DNA Recognition by Transcription Factors
Sequence preferences of DNA-binding proteins are a primary mechanism by
which cells interpret the genome. Despite these proteins' central
importance in physiology, development, and evolution, comprehensive
DNA-binding specificities have been determined experimentally for few
proteins. Here, we used microarrays containing all 10-base-pair sequences to
examine the binding specificities of 104 distinct mouse DNA-binding proteins
representing 22 structural classes. Our results reveal a complex landscape of
binding, with virtually every protein analyzed possessing unique preferences.
Roughly half of the proteins each recognized multiple distinctly different
sequence motifs, challenging our molecular understanding of how proteins
interact with their DNA binding sites. This complexity in DNA recognition may
be important in gene regulation and in evolution of transcriptional
Click here to
visit the online database of experimental results
Click here to
- All supplementary figures combined into one file.
- Figure S1:
- Cloning strategy.
- Figure S2:
- Comparison of PBM data for DBD versus full-length constructs for 5
TFs. (A) Motif logo comparisons and k-mer correlations, and
(B) k-mer PBM enrichment score scatter plots.
- Figure S3:
- Comparison of TFs overexpressed and purified from E. coli
versus expressed by in vitro transcription and translation.
(A) Motif logos, and (B) k-mer correlation plots, from
- Figure S4:
- PBM data reproducibility. (A)-(D) Clustergram of k-mers for
all PBM data prior to combining data from array designs #1 and #2,
showing that array designs #1 and #2 cluster together for each protein.
(E) Reproducibility of E-scores and Z-scores from array designs
#1 and #2.
- Figure S5:
- Agreement of PBM k-mer data with prior motif data, in general.
- Figure S6:
- Comparison of PBM data versus Kd data for the yeast TF Cbf1 and the
murine/human TF Max.
- Figure S7:
- Confirmation of PBM-derived motifs by EMSAs for three newly
characterized proteins and one recently characterized protein.
- Figure S8:
- Binding profiles of specific TF DBD structural classes. (A)
HMG/SOX, (B) AP-2, (C) ARID/BRIGHT, (D) bZIP,
(E) ZnF_C4, (F) E2F, (G) ETS, (H) Forkhead,
(I) GATA, (J) HLH, (K) homeodomain, (L)
IRF, (M) RFX, (N) SAND DNA-binding domains.
- Figure S9:
- Confirmation of secondary motifs by EMSAs for 6 TFs: Hnf4a, Nkx3.1,
Mybl1, Foxj3, Rfxdc2 and Myb.
- Figure S10:
- Primary, secondary, and tertiary Seed-N-Wobble motifs identified in
PBM data for the human POU homeodomain TF Oct-1.
- Figure S11:
- High-scoring k-mers belonging to the Jundm2 secondary motif are not
bound as well by the related bZIP protein Atf1.
- Figure S12:
- RFX protein-DNA recognition positions.
- Figure S13:
- Graphs showing log10(1-AUC) (area under ROC curve) (y-axis)
versus log10(number of positives) (x-axis) for Hnf4a.
- Figure S14:
- Enrichment of primary versus secondary motif 8-mers bound in
vitro within genomic regions bound in vivo for (A, C,
D) Hnf4a and (B, E, F) Bcl6b.
- Logos for all motifs
generated from each array design separately and from the combined array
performance plots show that a multiple motif model best captures the
binding profiles for most TFs.
simulated 14bp motifs
simulated 14bp motifs Seed-and-Wobble-primary motifs
simulated 14bp motifs Seed-and-Wobble-secondary motifs
- Simulated 14bp
14bp motifs Seed-and-Wobble-primary motifs
14bp motifs Seed-and-Wobble-secondary motifs
- All PBM-derived PWMs (.tar.gz compressed file, ~323 KB)
- All PBM-derived PWMs (.zip compressed file, ~1.3 MB)
- Table S1:
- Number of proteins in each different TF DBD structural class that
exists in the mouse genome, and the number of those that were examined
in this paper.
- Table S2:
- TF clones, sequences, and approximate concentrations used in
- Table S3:
- Comparison of PBM k-mer data to JASPAR, TRANSFAC, and
literature-derived motifs (AUC ≥ 0.8 and Q ≤ 0.01).