Comparing two proteins

Any two proteins can be compared using a utility within MOPAC. The general form of the data set that allows two proteins to be compared consists of two or three lines: keywords, title, and if wanted a comment line. A minimum set of keywords would consist of GEO_DAT, GEO_REF, and COMPARE. A useful keyword to add is OUTPUT; this reduces the size of the results, and allows the important material to be found more easily. The results of a comparison consist of a standard output file that gives a detailed description of the relationship between the two proteins, their similarities and their differences, a HTML web-page, a PDB file for each of the two proteins, and other less-important files. Individual differences can be easily seen by opening the JSmol HTML web-page and using the options provided.

Some differences in the two systems can be tolerated, for example one system might have residues missing that are present in the other system, or it might have a different ligand non-covalently bonded to a site. Provided the atoms common to both systems have the same PDB name (atom numbers are ignored), they will be included in the comparison.

Several types of comparison can be done, among these are:

Comparing two steps in a reaction mechanism or two different optimizations

Systems of this type are easy to compare, they have the same atoms, charge, and PDB atom labels. An example of such a data-set is:

 compare output noreor ++ 
  geo_dat="../All - new gradients, no PRECISE/23H 30-3.arc" ++
  geo_ref="../All - new gradients, PRECISE/23H 30-3.arc" 
 Comparison of the effect of using keyword "PRECISE" and not using it.

Keyword "NOREOR" is useful here; it prevents the small changes in orientation that are normally made when systems are compared. For neatness, the data-set uses the "++" option to split the keyword line into several lines, in this case three lines.

Comparing two raw PDB files

Comparison of two closely-related structures can be illustrated using the small protein Crambin, for which two high-resolution PDB entries exist: 1EJG and 1CBN. These can be downloaded from the Protein Data Bank; they have the default names 1EJG.pdb and 1CBN.pdb. Make up a data set, "Compare 1EJG and 1CBN.mop", that contains the two lines:

GEO_DAT="1EJG.pdb" GEO_REF="1CBN.pdb" COMPARE OUTPUT
Comparison of two PDB structures for Crambin

If this job is run, the two proteins will be compared.

Comparing two hydrogenated PDB files

When two or more PDB entries represent the same system, there is a natural strong desire to compare the various structures. A good example is provided by the small protein Crambin, for which two high-resolution PDB entries exist: 1EJG and 1CBN. After hydrogenation and optimization of the hydrogen atom positions, the two resulting ARC files can be compared using the following data-set:

 geo_dat="Crambin (1EJG).arc" ++
 geo_ref="Crambin (1CBN).arc" ++
 compare output
 Compare the hydrogenated PDB files for two models of Crambin

Examination of 1EJG and 1CBN reveals that 1CBN has a hetero-group, C₂O, this moiety is missing in 1EJG. From the PDB remarks this entity is identified as ethanol, and indeed when 1CBN is hydrogenated (ADD-H) six hydrogen atoms are added to it to form ethanol. Because of this, although the heats of formation are present in the ARC files they should be ignored in the results of the COMPARE job.