========================= High Resolution Decoy Set ========================= 0. Outline ---------- 1. Introduction 2. Instructions 3. Contact 4. References 1. Introduction ---------- This high resolution decoy set of 1400 proteins was initially developed for force field development (see References). If you use these decoys for your own research, please cite the authors in your publication(s). 2. Instructions ---------- The file HRGeneral.tar.bz2 contains some summary information about the decoy set. To extract the appropriate files, use the commands: bunzip2 HRGeneral.tar.bz2 tar -xf HRGeneral.tar (Windows users: Use the tool of your choice to unzip and extract the file) The compressed file should contain Training1250.txt : A list of the 1250 proteins used for training Test150.txt : A list of the 150 proteins used for testing DecoyInfo.txt : A list of the number of decoys and number of residues for each protein mergePDB.py : A Python utility for reassembling decoy PDBs (see below) The decoys of the 1400 proteins are in the files HRDecoys_N.tar.bz2, where N is 1-7. This file should be extracted in a similar fashion as HRGeneral.tar.bz2 This should create 1400 subdirectories, one for each protein. In each subdirectory, there will be 5 types of files: .native.pdb : the native file from the protein data bank (renumbered) .rmsd : the Calpha rmsd value of each conformer to the native structure .llabel.pdb: the generic leftmost 31 characters from the conformers .rlabel.pdb: the generic rightmost 12 characters from the conformers X.coord.pdb: the specific coordinates of a given conformer number X The decoys can be converted to a standard PDB file using the following algorithm: Write line 1 of X.coord.pdb to the file .X.pdb Write lines 1-3 of .llabel.pdb to the file .X.pdb Paste together the remaining information of .llabel.pdb, .X.coord.pdb, and .rlabel.pdb in a columnwise form to the file .X.pdb A short Python script (mergePDB.py) has been included to perform this task for a single conformer. All necessary scripting to apply this or any code to this task for all 1400 proteins is left to the user. The python script can be used as mergePDB.py name name name.X 3. Contact ---------- Questions should be directed to floudas@titan.princeton.edu 4. References ---------- (1) R. Rajgaria, S. R. McAllister, C. A. Floudas. A novel C-alpha-C-alpha distance dependent force field based on a high quality decoy set. 2006, submitted. (2) R. Rajgaria, S. R. McAllister, C. A. Floudas. 2006, in preparation.