Research Overview

The ability of a cell to respond to environmental stresses, differentiate properly, and progress normally through the cell cycle requires a specific and coordinated gene expression program involving regulated transcription of thousands of genes. The initiation of transcription is in large part controlled by the binding of transcriptional activators and repressors (trans factors) to specific sequences on DNA (cis elements). Despite their importance, the sequence specificities of most transcription factors (TFs) remain unknown, underscoring the need for a rapid and universal method to discover the cis element sequences bound by specific TFs.

To address this need we developed a novel DNA microarray-based in vitro technology, which we named protein binding microarrays (PBMs). This technology allows rapid, high-throughput characterization of the DNA binding site sequence specificities of TFs in a single day. We also developed a genome-sequence independent, universal PBM technology that can be used on TFs and other DNA-binding proteins from any organism. (Mukherjee et al., Nature Genetics (2004) 36(12):1331-9; Berger et al., Nature Biotechnology (2006) 24(11):1429-1435). The universal PBM technology is depicted in the schematic below.

Universal PBM containing all possible 10-mer binding sites, bound by the S. cerevisiae TF Cbf1 expressed with a glutathione S-transferase (GST) epitope tag. Above is a schematic showing the three main stages of each experiment: primer annealing, primer extension, and protein binding. Beneath are zoom-in images of each stage for the same microarray, scanned at different wavelengths: Cy5-labeled universal primer, Cy3-labeled dUTP and Alexa488-conjugated a-GST antibody. Fluorescence intensities are shown in false color, with blue indicating low signal intensity, green indicating moderate signal intensity, yellow indicating high signal intensity, and white indicating saturated signal intensity. The variability observed in the Cy3-dUTP signal is due to differences in the nucleotide composition of each feature. The blank spots are single-stranded negative control probes that do not contain the universal primer sequence.


Using PBMs, we have identified the DNA binding site sequence specificities of hundreds of TFs from a wide range of organisms, including yeast, worm, fly, mouse, human, and the important pathogens Vibrio harveyi and Plasmodium falciparum. PBM binding specificity data also allow for improved prediction of genomic TF binding sites and TFs' combinatorial co-regulation of target genes, which aid in the identification of cis regulatory elements and gene regulatory networks.

In pursuit of these broader and more integrated goals we have developed computational approaches for analyzing cis regulatory modules (CRMs), which are groups of many specific cis regulatory elements in close proximity along the genomic DNA. We have developed algorithms that include a rigorous statistical consideration of TF binding site clustering, their combinatorial co-occurrences, and cross-species conservation in order to identify candidate CRMs. We have seen that our predicted CRMs are bound by the corresponding TFs in mammalian cells, and that predicted transcriptional enhancers drive temporal- and cell-specific gene expression in the developing Drosophila embryo. A newer algorithm of ours can refine a prior hypothesis of TFs' combinatorial co-regulation of the expression of their target genes. These and newer strategies under development in our lab can be applied widely for analyzing metazoan transcriptional regulatory networks.