Structural Biology

In addition to determining the DNA binding specificities of various transcription factors (TFs), we are also interested in identifying the structural and mechanistic basis of these specificities. TFs must be able to bind their DNA target sites with high affinity and distinguish true binding sites from non-target DNA. Determining the mechanisms of binding specificity requires structural studies of how the TF recognizes DNA.

TFs are grouped into families based on sequence homology within the DNA binding domain, and DNA binding domains within a family adopt similar three-dimensional structures. Representative structures of numerous TF families have been solved in complex with their cognate DNA binding site showing how each family recognizes DNA. Below are some examples of how we have combined our work on DNA binding specificities with this structural approach in order to understand more thoroughly the mechanistic basis of specificity.


The ~60 amino acid homeobox domain or `homeodomain' is a conserved DNA-binding protein domain best known for its role in transcription regulation during vertebrate development. The homeodomain binds DNA predominantly through interactions between helix 3 (recognition helix) and the major groove. Since many homeodomains have similar DNA sequence preferences, much attention has been paid to the role of protein-protein interactions in target definition, despite evidence that the sequence specificity of monomers contributes to targeting specificity and that binding sequences do vary, particularly among different subtypes. It has been proposed that the DNA binding specificity of homeodomains is determined by a combinatorial molecular code among the DNA-contacting residues.

The mouse homeodomain complement, estimated at 260 distinct proteins and 275 individual homeodomains, is broadly conserved across animals. We used universal protein binding microarrays (PBMs) containing 41,944 60-mer probes in which all possible 10-base sequences are represented, to derive DNA binding specificity profiles for 168 mouse homeodomains. Our analysis showed that most homeodomains have distinctive sequence preferences, which may contribute to the strong selective pressure on their amino acid sequences as well as the biological specificity in target genes and diversity in function among the homeodomain proteins.

In addition to base-specific contacts made by positions 47, 50, and 54, which are believed to be the main determinants of differences in binding specificity, and residues in the N-terminal arm, we found additional recognition positions that are predictive of the differences in DNA binding specificity that we observed for related homeodomain proteins.


Forkhead TFs are of central importance in different developmental and post-natal contexts, including organogenesis, energy metabolism, homeostasis, the cardiovascular system, liver regeneration, human female reproductive tissues, immune cells, hair follicles, aging, and for diseases including diabetes, cancer, glaucoma, and language disorders. As described on our Transcription Factor Evolution webpage, we have recently identified multiple changes in DNA binding specificities in the forkhead TF family. Crystal structures of forkhead proteins in complex with the canonical FKH DNA binding site motif show that the forkhead DNA binding domain adopts a winged-helix fold, with which it presents a recognition helix to contact the major groove of the DNA. The amino acids that make base-specific contacts in these structures are well conserved in the family. We find that an alternate specificity for the FHL motif arose three times independently in this family.

It is currently unknown how the forkhead DNA binding domain recognizes the FHL motif. Amino acid positions that contact the bases in the FKH motif are conserved in proteins that recognize the FHL motif. Additionally some proteins are capable of binding both the canonical forkhead motif (FKH) as well as the alternate site, the FHL motif. This suggests that the forkhead DNA binding domain is capable of contacting DNA via two distinct conformations. Future structural work and mutation studies are expected to reveal what parts of the forkhead domain are important for recognizing the FHL motif and how the single domain can recognize two such different motifs.