The starting model is, without doubt, the most important structure that is used in modeling protein chemistry. Generating the starting model requires a lot of work: getting a suitable PDB file, adding hydrogen atoms and checking all unusual features, checking simulation-generated and potential salt bridges, performing an unconstrained geometry optimization, and finally checking the resulting geometry once more for any possible anomalies or faults. The result of all this work should be three files, a PDB file, a normal MOPAC data set file, and a MOPAC archive file. Of these, the archive file is the most versatile, in that it contains the most useful information, and can be edited easily to make a MOPAC data set. In the archive file, <file>.arc, and the MOPAC data set, <file>.mop, the data-set should have the following structure:
(1) A set of lines that start with an asterisk, indicating that they are comments. These lines will be used in making the start of a PDB file. Lines of this type start with PDB-type line entries, such as:
*HEADER
*REMARK
*HELIX
*SHEET
If these lines are not wanted, either delete them or add NOCOMMENTS to the next data-set.
(2) The normal keyword line. In the starting model, this will normally consist of the keyword START_RES. This keyword will be generated by ADD-H and unless there is a need to precisely reproduce the labeling of the original PDB file, specifically the "TER"s, it can be deleted.
(3) A title line.
(4) A comment line
(5) The complete geometry of the system. Each atom is labeled with information that allows the original PDB atom labels to be constructed.
(6) The standard blank line that indicates the end of the data set.
Because so much effort is required to make a starting model, it is important that the starting model should be copied and the copy put in a safe location. Of course, the starting model should be given a very descriptive name, e.g. "Crambin 1CBN Starting Model -3530kcal per mole.arc" A safe location could be a folder that contains only starting models.
A useful step is to include the heat of formation in the name. Protein modeling assumes that, given two different conformers, the lowest energy system is the more correct. So by including the ΔHf of the starting model in its name, the any potentially better model, i.e., a model with a lower ΔHf can readily be compared with the reference starting model.
During modeling work, there is the possibility that the system might unexpectedly drop in energy, as a salt bridge forms or a strained structure relaxes. If this happens, then the cause of the change should be simulated using the starting model. If the ΔHf is lowered by more than, e.g., 2 or 3 kcal/mol, then the current starting model should be replaced by a new starting model that has a new name. The old starting model should not be deleted at this point - it might be needed later on.
At first sight, the possibility that many months of hard work would be invalidated by a change in the starting model might be depressing, but if a careful record has been maintained for all the interesting structures - intermediates, transition states, vibrational frequencies, etc. - then these structures can usually be easily modified to incorporate the change and re-run.
Of course, it is much better to have a good starting model to begin with!
The PDB and MOPAC formats are readily interconverted: to convert a MOPAC format into a PDB format, run a MOPAC data set using keywords 0SCF PDBOUT only. This will generate the PDB format. To convert a PDB format into a MOPAC format, simply open the PDB using MOPAC; this will automatically generate a MOPAC data set, this can be found in the file <file>.arc.