The ability of a cell to respond to environmental stresses, differentiate properly, and progress normally through the cell cycle requires a specific and coordinated gene expression program involving regulated transcription of thousands of genes. The initiation of transcription is in large part controlled by the binding of transcriptional activators and repressors (trans factors) to specific sequences on DNA (cis elements). Despite their importance, the sequence specificities of most transcription factors (TFs) remain unknown, underscoring the need for a rapid and universal method to discover the cis element sequences bound by specific TFs.
To address this need we developed a novel DNA microarray-based in vitro technology, which we named protein binding microarrays (PBMs). This technology allows rapid, high-throughput characterization of the DNA binding site sequence specificities of TFs in a single day. We also developed a genome-sequence independent, universal PBM technology that can be used on TFs and other DNA-binding proteins from any organism. (Mukherjee et al., Nature Genetics (2004) 36(12):1331-9; Berger et al., Nature Biotechnology (2006) 24(11):1429-1435). The universal PBM technology is depicted in the schematic below.
Using PBMs, we have identified the DNA binding site sequence specificities of hundreds of TFs from a wide range of organisms, including yeast, worm, fly, mouse, human, and the important pathogens Vibrio harveyi and Plasmodium falciparum. PBM binding specificity data also allow for improved prediction of genomic TF binding sites and TFs' combinatorial co-regulation of target genes, which aid in the identification of cis regulatory elements and gene regulatory networks.
In pursuit of these broader and more integrated goals we have developed computational approaches for analyzing cis regulatory modules (CRMs), which are groups of many specific cis regulatory elements in close proximity along the genomic DNA. We have developed algorithms that include a rigorous statistical consideration of TF binding site clustering, their combinatorial co-occurrences, and cross-species conservation in order to identify candidate CRMs. We have seen that our predicted CRMs are bound by the corresponding TFs in mammalian cells, and that predicted transcriptional enhancers drive temporal- and cell-specific gene expression in the developing Drosophila embryo. A newer algorithm of ours can refine a prior hypothesis of TFs' combinatorial co-regulation of the expression of their target genes. These and newer strategies under development in our lab can be applied widely for analyzing metazoan transcriptional regulatory networks.