Research Overview
The ability of a cell to respond to environmental stresses, differentiate properly, and progress normally through the cell cycle requires a specific and coordinated gene expression program involving regulated transcription of thousands of genes. The initiation of transcription is in large part controlled by the binding of transcriptional activators and repressors (trans factors) to specific sequences on DNA (cis elements). These interactions control critical steps in development and cell cycle control.
In-depth characterization of the DNA binding specificities of TFs and features of the cis-regulatory elements that they bind is essential for understanding how transcriptional regulation is specified. Despite their importance, the sequence specificities of a large fraction transcription factors (TFs) remain unknown, underscoring the need for a rapid and universal method to discover the cis element sequences bound by specific TFs. In addition to determining the DNA binding specificities of various transcription factors (TFs), we are also interested in identifying the structural and mechanistic basis of these specificities. Furthermore, determining how mutations in either TFs or their DNA binding sites can perturb gene regulatory programs and thus lead to phenotypic variation, including disease, is of major importance for precision medicine.
Multiple approaches for highly parallel analysis of TF DNA binding specificities have been developed by us and other groups in the past decade and have been used in multiple surveys of the DNA binding specificities of TFs from a variety of TF DNA binding domains and species. We developed a novel DNA microarray-based in vitro technology, termed protein binding microarrays (PBMs), that allows rapid, high-throughput characterization of the DNA binding site sequence specificities of TFs in a single day. We also developed a genome-sequence independent, universal PBM technology that can be used on TFs and other DNA-binding proteins from any organism. (Mukherjee et al., Nature Genetics (2004) 36(12):1331-9; Berger et al., Nature Biotechnology (2006) 24(11):1429-1435). Using PBMs, we have identified the DNA binding site sequence specificities of hundreds of TFs from a wide range of organisms, including yeast, worm, fly, mouse, human, and the important pathogens Vibrio harveyi and Plasmodium falciparum. PBM binding specificity data also allow for improved prediction of genomic TF binding sites and TFs' combinatorial co-regulation of target genes, which aid in the identification of cis regulatory elements and gene regulatory networks.
In pursuit of these broader and more integrated goals we have developed computational approaches for analyzing cis regulatory modules (CRMs), which are groups of many specific cis regulatory elements in close proximity along the genomic DNA. We have developed algorithms that include a rigorous statistical consideration of TF binding site clustering, their combinatorial co-occurrences, and cross-species conservation in order to identify candidate CRMs. We have seen that our predicted CRMs are bound by the corresponding TFs in mammalian cells, and that predicted transcriptional enhancers drive temporal- and cell-specific gene expression in the developing Drosophila embryo. Another algorithm of ours can refine a prior hypothesis of TFs' combinatorial co-regulation of the expression of their target genes. These and newer strategies under development in our lab can be applied widely for analyzing metazoan transcriptional regulatory networks.
We have also developed new, highly parallel technologies to assay the activities of putative cis-regulatory elements in vivo. We developed ‘enhancer-FACS-Seq’ (eFS) (Gisselbrecht et al., Nature Methods, 2013), for highly parallel identification of active, tissue-specific transcriptional enhancers in the context of whole Drosophila embryos. Using eFS, we discovered novel enhancers that drive gene expression in embryonic mesoderm. More recently, we have developed analogous ‘silencer-FACS-Seq’ technology (Gisselbrecht et al., Molecular Cell, 2020), for highly parallel identification of active, tissue-specific transcriptional silencers. Using sFS in Drosophila embryos, we have identified novel silencers that dampen gene expression in embryonic mesoderm. We aim to identify the transcription factors and features of these regulatory elements that are important for the activities of these cis-regulatory elements.