Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities.
This website supports Berger, Philippakis, et al.
We have created a novel, maximally compact, synthetic DNA sequence design for protein binding microarrays (PBMs) that represents all possible DNA sequence variants of a given length k in an overlapping fashion on a single slide. We constructed such ‘all 10-mer’ microarrays by converting high-density single-stranded Agilent oligonucleotide arrays to double-stranded DNA arrays via on-array primer extension. Using these microarrays, we comprehensively determined the binding specificities over a full range of affinities of several TFs of diverse structural classes from yeast, worm, mouse, and human. We also developed a novel computational method to construct TF binding site motifs that takes full advantage of the unbiased sequence representation on these arrays, using all features (rather than only those above an arbitrary cutoff) to determine the optimal motif without any prior knowledge. As advances in microarray printing technology permit increased feature densities and feature lengths, our universal design will enable the complete coverage of even longer binding sites. This unbiased, comprehensive coverage of all k-mers permits interrogation of binding site preferences, including nucleotide interdependencies, at unprecedented resolution.
Below are the Supplementary data files that accompany this manuscript.
For the normalized signal intensities and DNA probe sequences of our two separate Agilent ‘all 10-mer’ universal microarrays, please click here.
Survey of binding sites in the JASPAR database.
Reproducibility of Cy3 dUTP signal intensities.
Correlation of PBM signal intensities with affinities.
Effects of binding site position and orientation on PBM signal.
Comparison of median signal intensities for 28 Zif268 variants for fixed versus variable position, orientation, and flanking sequence.
Correspondence between median signal intensities for 7-mers on distinct de Bruijn sequences.
Biacore measurements supporting interdependence between the first two positions of the Cbf1 binding site.
Minimum number of unique features on an array for different values of k.
Dependence of Cy3 dUTP incorporation upon sequence context.