(Modeling proteins)

Comparing two proteins

Any two proteins can be compared using a utility within MOPAC.  The general form of the data set that allows two proteins to be compared consists of two or three lines: keywords, title, and if wanted a comment line. A minimum set of keywords would consist of GEO_DAT, GEO_REF, 0SCF and HTML. A useful keyword to add is OUTPUT; this reduces the size of the results, and allows the important material to be found more easily.  The results of a comparison consist of a standard output file that gives a detailed description of the relationship between the two proteins, their similarities and their differences, a HTML web-page, a PDB file for each of the two proteins, and other less-important files.  Individual differences can be easily seen by opening the JSmol HTML web-page and using the options provided.

Several types of comparison can be done, among these are:

Comparing two raw PDB files

Raw PDB files often have the suffix ".pdb"; this would present a problem because, during the comparison of two proteins, files that end in ".pdb" are generated.  If not stopped, these new files would over-write the originals.  To avoid this, the input files must not end in ".pdb" so a necessary first step would be to re-name any original PDB files that end in ".pdb" to give them a different suffix.  A suitable new suffix would be ".ent" 

Comparison of two closely-related structures can be illustrated using the small protein Crambin, for which two high-resolution PDB entries exist: 1EJG and 1CBN.  These can be downloaded from the Protein Data Bank; they have the default names 1EJG.pdb and 1CBN.pdb.  Re-name them to 1EJG.ent and 1CBN.ent, then make up a data set, Compare_1EJG_and_1CBN.mop, that contains the two lines:

GEO_DAT="1EJG.ent" GEO_REF="1CBN.ent" COMPARE OUTPUT
Comparison of two PDB structures for Crambin 

(MOPAC file-names can have spaces, but it's easier to use the underscore when describing files; with underscores, where the file name starts and ends is obvious.)

If this job is run, the two proteins will be compared.

Comparing two hydrogenated PDB files

When two or more PDB entries represent the same system, there is a natural strong desire to compare the various structures.  A good example is provided by the small protein Crambin, for which two high-resolution PDB entries exist: 1EJG and 1CBN.

Any comparison of the heats of formation of different structures requires that the various structures have the same formula.  Examination of 1EJG and 1CBN reveals that 1CBN has a hetero-group, C2O, this moiety is missing in 1EJG.  From the PDB remarks, the entity is identified as ethanol, and indeed when 1CBN is hydrogenated (ADD-H), six hydrogen atoms are added to it to form ethanol.  Before proceeding therefore, the molecule that is present in one entry but not in the other must be deleted.  These two high-resolution structures are unusual in that the PDB entries include the hydrogen atoms.  Almost always, small mistakes can be found in the PDB entries; these two systems are not exceptions, therefore disregard the supplied hydrogen atoms, and re-hydrogenate before proceeding.  This will ensure that hydrogenation is consistent.  Of course, if any special circumstances exist - such as known ionized sites or warning messages in the LOG file, small changes might be necessary to correct these errors.  The objective is to ensure that each entry  has exactly the same formula.

To compare two structures, use keyword GEO_REF. To use this utility, make two data sets, one for each structure.  In one data set - either one will do - add GEO_REF=<text> where <text> indicates the other structure, and also add keywords 0SCF and HTML, if the JSmol utility is installed.  Run the job; it will produce a new PDB file; this is the re-sequenced GEO_REF structure optimized for overlap. 

If this operation is not done at this point, the lack of maximum overlap might cause severe problems later on.