[Admin Home]

Add a Protein

Use the top form to enter a single new protein into UniPROBE, or use the bottom form to enter multiple proteins at once through a file.

Add Protein


You should have received this by email and on the web page after adding your publication.


e.g.: Sox4 Note: While most proteins are properly formatted as "Yfg123", C. elegans proteins are properly formatted as "YFG123"


Note: If you don't see your species here, you should contact the admin at uniprobe@genetics.med.harvard.edu and it will be added.


e.g.: SRY (sex determining region Y)-box 4



e.g.: Transcription factor SOX-4, OTTHUMP00000039358, SRY-related HMG-box gene 4, ecotropic viral integration site 16, EVI16


e.g.: 92329


e.g.: Q06945


For the protein (not gene). If there is more than one NP identifier, use the first one with a valid link.
e.g.: NP_003098


If there is an entry for this protein, or an ortholog, in JASPAR, this column holds its ID tag.



A brief description of the gene / protein.
e.g.: This intronless gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins, such as syndecan binding protein (syntenin). The protein may function in the apoptosis pathway leading to cell death as well as to tumorigenesis and may mediate downstream effects of parathyroid hormone (PTH) and PTH-related protein (PTHrP) in bone development. The solution structure has been resolved for the HMG-box of a similar mouse protein.


Abbreviation for DNA binding domain type, use Pfam.
e.g.: HMG_box


This column holds a unique identification number / string for a particular species. This id (and the corresponding database name) can be specified by making custom details page (see details.php in section 1).
e.g.: for Yeast, it might be YDR310C

Add Multiple Proteins in a File


You should have received this by email and on the web page after adding your publication.

The file should contain each protein on a different line, with the fields in each line separated by tabs in the order specified in the top form
(with the exception of publication): i.e.,
protein name, species, full protein name, synonyms, IHOP id, Uniprot id, RefSeq id, JASPAR id, description, domain, unique species id, has_pbm_data ("y" or "n"), has_pbm_data for whole protein ("y" or "n").

For the various definitions and requirements of each field, see the top form.

NOTE: While using an Excel spreadsheet is helpful in preparing such a file, we cannot guarantee the proper formatting will result from submitting an Excel file, so you should submit a text file with tab-separated values.
But you can simply copy and paste your spreadsheet content directly into a text editor to achieve this. However, PLEASE ENSURE THAT YOUR FILES END UP IN UNIX FORMAT.

Any species listed in your input file must already be listed in the database's species file; see the top form's "Species" pulldown to see which species are currently available.
If one or more of your species is absent from this list, email uniprobe@genetics.med.harvard.edu, and we will make it available for you.

If you're leaving out certain optional fields (any but protein name and species), please use NULL for these fields.

Be careful to enter all your data correctly and properly formatted! This means the species must exactly match one of the options listed in the pulldowns in the top form, or your upload may fail.

For example, a line in your file might look like this:

Pdr3     Saccharomyces cerevisiae        Pleiotropic Drug Resistance     AMY2, TPE2      32649   P33200  NP_009548       MA0353.1        Transcriptional activator of the pleiotropic drug resistance network, regulates expression of ATP-binding cassette (ABC) transporters through binding to cis-acting sites known as PDREs (PDR responsive elements); post-translationally up-regulated in cells lacking a functional mitochondrial genome        Zn_clus  YBL005W    y


If your dataset is very large, you may want to use our web tool to generate a template input file for you.
You will have to input a file with proteins listed on separate line, with each line having a protein name and a species separated by a tab. No other fields are necessary.
The tool will mine public databases such as NCBI Entrez Gene, IHOP, and UniProtKB/SwissProt for the relevant information, and assemble it into a template file.
Not all fields will be filled in, and these will be filled with something like "[insert value here]".
There is also some possibility of error for those fields that are filled in, so you should manually check the results,
and make any corrections that are necessary. Nevertheless, we hope this will make it easier for you to build your file than starting from scratch.