[Admin Home]

Here are instructions about how to prepare your data files before uploading them. Included is some info about helpful scripts you can use to aid you in the process.

Instructions

Format the files according to the instructions below, running any necessary scripts. When you are done, cd into your directory and execute the following command to create the zip you'll upload to the server:

zip -r ../[folder_name].zip *

Data Files

Each file should start with the name of its parent directory followed by an underscore. Not all of these files are required, and precautions have been taken to account for the absence of certain file types, but the more of these you have, the better!

Helpful scripts

Download these scripts here.
NOTE: It is advisable to back up your entire publication folder in bulk (e.g., using cp -r, or by making a zip) before running any of these scripts. Please be sure you understand what each script will do before running it!
See toward the bottom (Calling order) for a template sequence in which to run them.

*Please open these scripts and read the comments to make sure you understand how to run them and what they will do before running them. Be careful as they will delete files!

Calling order

Let's say you need to call every single one of the above shell scripts for your publication directory. Here is the order in which you would do it. This isn't set in stone but is a good template to follow. You can follow this order for any dataset and simply omit whichever scripts you don't need to run. Here DIR_PATH is the full path to the directory for your publication in which you're keeping the data files.
NOTE: It is advisable to back up your entire publication folder in bulk (using cp -r) before running any of these scripts. You should also check that each script worked correctly after EVERY step. If step 4, 5, or 6 did not work properly, call restore_from_backup.sh DIR_PATH with the appropriate file extension (pwm or txt, most likely). In the event of any script not performing correctly or the way you need it to, contact the DBA (uniprobe@genetics.med.harvard.edu) so the script can be upgraded.
  1. move_misnamed_files.sh DIR_PATH NEW_STR OLD_STR (filling in new and old names; call as many times as you need)
  2. concatenate_data_files.sh DIR_PATH FILE_EXTENSION (where FILE_EXTENSION is as described above for this script; call as many times as you need)
  3. adjust_filenames.sh DIR_PATH
  4. generate_motif_line.sh DIR_PATH FORMAT (where FORMAT is 0 or 1 as described above for this script)
  5. add_headers.sh DIR_PATH
  6. add_algorithm_and_format_line.sh DIR_PATH EXTENSION ALGORITHM PROBABILITY (as needed; see script description)
  7. rearrange_8mer_columns.sh DIR_PATH ESCORE_COL MEDIAN_COL ZSCORE_COL (see script description)
  8. remove_backups.sh DIR_PATH pwm
  9. remove_backups.sh DIR_PATH txt
Then if all is well you can remove your backup folder that you made before you started.