Welcome to UniPROBE
The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA binding specificities of proteins. This initial release of the UniPROBE database provides a centralized resource for accessing comprehensive data on the preferences of proteins for all possible sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In total, the database currently hosts DNA binding data for 574 nonredundant proteins and complexes from a diverse collection of organisms, including the prokaryote Vibrio harveyi, the eukaryotic malarial parasite Plasmodium falciparum, the parasitic Apicomplexan Cryptosporidium parvum, the yeast Saccharomyces cerevisiae, the worm Caenorhabditis elegans, mouse, and human. The database's web tools (on the right) include a text-based search, a function for assessing motif similarity between user-entered data and database PWMs, and a function for locating putative binding sites along user-entered nucleotide sequences. Please click on each tool's "help" link for more information.
Do you have PBM data you'd like to deposit into UniPROBE? You can find our new data deposition pipeline here.
A paper containing a summary of several recent updates to UniPROBE has been published in Nucleic Acids Research under the following citation:
Hume MA, Barrera LA, Gisselbrecht SS, Bulyk ML. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Research 2014; doi: 10.1093/nar/gku1045
It can be found here: http://www.ncbi.nlm.nih.gov/pubmed/25378322
If you encounter any problems or issues with this site, please email firstname.lastname@example.org.
News and Updates
- We have corrected numerous minor inaccuracies and omissions in the readme.txt files available for download along with each data set. If you have any questions about the changes, or any other questions about the readme files or the datasets in general, please contact us at email@example.com.
- New human PBM data for various TFs have been integrated into UniPROBE. These data are described in "Survey of variation in human transcription factors reveals prevalent DNA binding changes", which is in press for Science.
- New Toxoplasma gondii PBM data for AP2 TFs have been integrated into UniPROBE. These data are described in ApiAP2 transcription factor restricts development of the Toxoplasma tissue cyst., which has been published in PNAS.
- New Arabidopsis thaliana PBM data for various TFs have been integrated into UniPROBE. These data are described in A DNA-binding-site landscape and regulatory network analysis for NAC transcription factors in Arabidopsis thaliana., which has been published in Nucleic Acids Res..
- Publications are now assigned unique, searchable accession ids. You can search for all a publication's data using its id, in the quick search, or the text search using the "All" or "By Pub" options. The accession id for each reference can be found on the References page.
- New PBM data for AP2 TFs in Arabidopsis thaliana and Arabidopsis lyrata have been integrated into UniPROBE. These data are described in "Molecular evidence for functional divergence and decay of a transcription factor derived from whole genome duplication in Arabidopsis thaliana.", which is in press for Plant Physiology.
- An upgrade to the TFBS search tool has been posted, allowing a restriction to one or more specific species. Predefined categories of species are also defined for convenience. Click on "Advanced Options" in the toolbox on the right side of the page to see more.
- BEEML-PBM motifs are now online for almost all publications in UniPROBE. The only exceptions are Pompeani et al. (2008) and De Silva et al. (2008).
- A link for our data deposition pipeline is now publicly available on the toolbar at the top of the page. If you have PBM data from a published paper that you wish to deposit in UniPROBE, click on the link and follow the instructions.
- New Arabidopsis thaliana PBM data for the TF LUX (LUX ARRHYTHMO) have been integrated into UniPROBE. These data are described in LUX ARRHYTHMO Encodes a night time repressor of circadian gene expression in the Arabidopsis core clock., which has been published in Current Biology.
- New Caenorhabditis elegans PBM data for bHLH TFs have been integrated into UniPROBE. These data are described in Using a structural and logics systems approach to infer bHLH DNA binding specificity determinants, which has been published in Nucleic Acids Research.
- New Mus musculus PBM data for zf-H2C2_2 TFs have been integrated into UniPROBE. These data are described in Neural specific Sox2 input and differential Gli binding affinity provide context and positional information in Shh-directed neural patterning., which has been published in Genes & Development.
- New Drosophila melanogaster PBM data for the Zinc Finger C2H2 TF Lame duck (Lmd) have been integrated into UniPROBE. These data are described in Integrative analysis of the zinc finger transcription factor Lame duck in the Drosophila myogenic gene regulatory network., which has been published in PNAS.
- New Drosophila melanogaster PBM data for the zf-H2C2_2 TF CLAMP have been integrated into UniPROBE. These data are described in The CLAMP protein links the MSL complex to the X-chromosome during Drosophila dosage compensation., which has been published in Genes & Development.
- New PBM data for Fork_head TFs have been integrated into UniPROBE. These data are described in DNA binding specificity changes in the evolution of forkhead transcription factors., which has been published in Proc Natl Acad Sci USA..
- We now introduce a more comprehensive negative control sequence generator tool. Instead of appearing on protein details pages and being restricted to that page's protein as before, now it appears as a standard tool throughout the site, and the user can select any number of proteins, clones, or protein complexes - from one to the entire contents of the database - as input. The generated sequence is predicted to have no residual binding by any of the chosen proteins. See the "Help" link in the upper right corner of the tool's dialog box (on the lower right of the screen) for more details.
- The protein details pages from some of the publications posted here now display sequence logos which were built with PWMs derived from the raw PBM signal data using BEEML-PBM (Zhao and Stormo, 2011), as an alternative to our Seed-and-Wobble algorithm. You can also download these BEEML-derived PWMs as frequency matrices from the relevant links on the downloads page. The raw BEEML-PBM output for all of this data can be found here.
- A new and improved version of the negative sequence control generator tool is coming soon. It will be displayed with the rest of the tools throughout the site, and will allow the user to input a customizable list of proteins from the database to generate a nonbinding sequence. In the meantime, some bugs have been found in the old tool, and it has been removed from all protein details pages. All bugs will be fixed in the new version once posted, and we recommend waiting until then to generate any negative control sequence you need, if you happened to use the old tool already. Thanks for your patience.
- New Drosophila melanogaster PBM data for Homeobox TFs have been integrated into UniPROBE. These data are described in Molecular mechanism underlying the regulatory specificity of a Drosophila homeodomain protein that specifies myoblast identity., which has been published in Development. (NOTE: The published data analysis was performed using the Universal PBM Analysis Suite with a keep fraction (kf) parameter of 0.9, meaning that 90% of foreground and background features were kept while calculating enrichment scores, as opposed to the default of 50%. However, the data files from the analysis using the default parameter of 0.5 are available for this publication on the Downloads page.)
- New Homo sapiens PBM data for various TFs have been integrated into UniPROBE. These data are described in Notch and MAML-1 complexation do not detectably alter the DNA binding specificity of the transcription factor CSL., which has been published in PLoS ONE.
- Our blastp seach feature now includes a visualization of the alignment between query and hit in the search results.
- The details pages for proteins from certain publications now have a negative control sequence generation feature. This will produce a DNA sequence within a user-specified length interval which, based on our PBM data, is virtually guaranteed not to be bound by the protein in question. Note: Not all publications have this feature integrated yet, but we are working on making sure that happens soon.
- New PBM data for T-box TFs have been integrated into UniPROBE. These data are described in Modular evolution of DNA binding preference of a Tbrain transcription factor provides a mechanism for modifying gene regulatory networks., which has been published in Molecular Biology and Evolution.
- New Saccharomyces cerevisiae PBM data for various TFs have been integrated into UniPROBE. These data are described in Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights., which has been published in Genome Biology.
- New Plasmodium falciparum PBM data for AP2 TFs have been integrated into UniPROBE. These data are described in Identification and genome-wide prediction of DNA binding specificities for the ApiAP2 family of regulators from the malaria parasite., which has been published in PLoS Pathog.
- Our protein pages now link to TFBSshape, a database from Remo Rohs's lab at USC which provides information about the shape of DNA at transcription factor binding sites. (Only those proteins from publications listed in that site have links.)
- Added new data from the paper Using protein design algorithms to understand the molecular basis of disease caused by protein-DNA interactions: the Pax6 example, which has been published in Nucleic Acids Research.
- Now user can download multiple files in a single zip file! More updates coming soon.
- New mouse PBM data for ETS domain TFs have been integrated into UniPROBE. These data are described in Genome-wide analysis of ETS family DNA-binding in vitro and in vivo, which has been published in The EMBO Journal.
- New worm PBM data for bHLH domain TFs have been integrated into UniPROBE. These data are described in A multi-parameter network reveals extensive divergence between C. elegans bHLH transcription factors, which has been published in Cell.
- Data have been added for the Nsy-7 TF (Caenorhabditis elegans) as described in Transcriptional regulation and stabilization of left-right neuronal identity in C. elegans, and published in Genes and Development.
- New mouse PBM data for 104 TFs have been integrated into UniPROBE. These data are described in Diversity and Complexity in DNA Recognition by Transcription Factors, which have been accepted for publication in Science.
- Data have been added for the SOX4 TF (homo sapiens) as described in Genome-wide promoter analysis of the SOX4 transcriptional network in prostate cancer, and published in Cancer Research. Please click here to visit the supplementary materials website for this publication.
- Due to an oversight made during the addition of the 89 yeast TFs to UniPROBE, these data were not incorporated into the .zip files in the "All Data" section of the downloads page. The .zip files are now fully up to date, and we apologize for this inconvenience.
- New yeast PBM data for 89 TFs have been integrated into UniPROBE. These data are described in High-Resolution DNA Binding Specificity Analysis of Yeast Transcription Factors, which is in press at Genome Research. Please click here to visit the supplementary materials website for this publication.
- A bug in the Search for TF Binding Sites tool has been corrected. Previously, the tool was incorrectly generating the reverse complement of the user entered sequence, which prevented it from finding reverse complement matches. Although the problem did not give rise to any false positive matches, many true positives may have been missed. We apologize for the inconvenience.
- The Downloads section now contains zip files holding data for every protein in the database (whereas before the files were segregated by publication), and it also now contains a zip file holding documentation and SQL code for generating many of the database's tables.
- The PBM Database has been updated, renamed, and relocated! This current version of the database is now called the UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) Database, and the "Search for Similar Motifs" and "Search for TF Binding Sites" tools are fully implemented.
- The PBM database is now public for the mouse homeodomains data! We recommend that you use either Firefox or Internet Explorer for browsing this site. Questions, comments, and suggestions are most welcome at the database help address (
If you wish to receive PBM Database updates, which will include the addition of new datasets and data analyses, you are encouraged to register for the website here.