The Compare function in MOPAC

Description - Procedure - Analysis - Complete worked example (Zip file, 242 Kb)

Description

A pair of protein systems can be compared to identify similarities and differences in their geometries. These systems can be PDB or ENT files, MOPAC data sets, i.e., files that end in ".mop" or ".dat", or MOPAC archive files, i.e., files that end in ".arc". The comparison is done using a MOPAC data-set that contains information on the locations of the systems to be used.  A useful practice is to have the systems in one or two folders and the data-set for the comparison in another folder at the same level, e.g., if the systems are in "Users/name/Crambin/Original PDB files" and "Users/name/Crambin/MOPAC files" then the compare files should be in "/Users/name/Crambin/Compare"

In a normal run, atoms of one system are re-arranged to match those of the-arranged to match those of the other system, so that both systems have the same atoms in the same sequence. The geometry of one system, "System A", is then rotated and translated to minimize the difference between it and that of "System B." Once the systems are in maximum coincidence, an analysis is performed; this identifies differences in the systems, differences in bond-lengths, and other quantities, such as the Root-Mean-Square and average atomic position differences.

Procedure (General - Systems - Other keywords - Line 2 - Line -3)

General

There is no key-word for the compare operation, instead it is performed using a specific combination of existing keywords.

The recommended way to do this is to use a data set, e.g. compare.mop, that contains three lines of the type:

geo_dat="System A" geo_ref="System B" 0scf geo-ok HTML output let
Text to be displayed
Second line of text to be displayed (Optional)

Systems

The text "System A" and "System B" should be replaced with the file-names for the systems of interest. For example, to compare a protein system "1A1A PDB structure.mop" and "1A1A optimized.arc" the first line would consist of:

geo_dat="1A1A optimized.arc" geo_ref="1A1A PDB structure.mop" 0scf geo-ok HTML output let
Compare original PDB structure and optimized geometry for system 1A1A 
Do not use PDB files that are in the current directory - if they were to be used, then they would be over-written by the run.  Instead, either re-name the file from <file>.pdb to <file>.ent, or have the PDB file in a different directory, and add the path or relative path to that directory, thus:
geo_ref="C:/Users/name/Original PDB files/1A1A.pdb (for Windows) or geo_ref="/Users/name/Original PDB files/1A1A.pdb" (for Linux)
or
geo_ref="../Original PDB files/1A1A.pdb" 

Other keywords used

0SCF: This prevents any semiempirical calculations being done.

GEO-OK: Although not essential, it's always better to have GEO-OK present.  In this type of job, GEO-OK allows the systems to have different numbers of atoms.  For example, files from the PDB web-site typically do not include hydrogen atoms, but optimized structures require hydrogen atoms to be present.  Without GEO-OK, systems with different formulae would result in an error being detected, but when GEO-OK is present, the sub-set of atoms common to both systems would be used.

HTML: Not essential, but simplifies understanding the differences between the two systems. An HTML web-page is generated that allows the two systems to be inspected graphically.  Data referring to the GEO_REF system will be displayed in green.

OUTPUT: Not essential, but reduced the size of the output, and allows the output of interest to be found more rapidly

LET: Not essential, but allows the job to be run more than once without having to delete files.  If LET is not present, then when the job is run once, new PDB files will be generated (assuming HTML is present).  The second time it is run, the PDB files from the previous run will be detected.  This would cause an error-message to be printed, and the job stopped.

Line 2

Optional, but useful.  This line will be displayed in the HTML web-page, and an informative description of the system can be very helpful later on when several web-pages are being used.

Line 3

Optional.  Similar to Line 1, this line will also be displayed in the HTML web-page. 

Analysis

"Differences in atoms sets" - Self-explanatory.   Have a quick look at the differences, and if nothing unexplained is seen, ignore this section.

"Number of atoms" - Self-explanatory. 

"List of atoms in GEO_REF that were swapped in order to maximize overlap" - On rare occasions the overlap of the two systems can be improved by swapping one or more pairs of atoms.  Only atoms that are equivalent, such as the two oxygen atoms in a carboxylate group, can be swapped.  Unless an error is suspected, this section is not important.

"After docking - Atoms that move a lot" - This list is in order of decreasing motion.  The first entry is the atom that shows the largest motion in going from one system to the other.  A useful operation would be to open the HTML web-page generated by the run, and look at some of the atoms that move a lot.

In the next few entries, the assumption is made that one geometry is a reference, e.g., a PDB file downloaded from the internet, and the other is a geometry optimized using MOPAC.

"Difference between GEO_DAT and GEO_REF:" - The RMS (Root Mean Square) difference is strictly comparable with the RMD quantities reported in other fields of protein structural chemistry, e.g., different PDB reports on the same system.

"Differences between bond-lengths for the two geometries" - This is important.  There are three common reasons for large (greater than 0.05 or 4%) differences in bond-length: (A) faults in the PDB structure, (B) faults in the computational method being used, e.g., PM7 with COSMO, and (C) incorrect hydrogenation resulting in a local fault in the chemical structure.  To identify where the fault lies, examine the bond using the HTML web-page.  In order to get a better view of the fault, display only the local environment of the two atoms. If the fault is identified as (A), ignore the fault - it was corrected by the geometry optimization.  If (B), also ignore the fault, but make a note that this is a fault in the method being used, and might be important.  If you think it is important, try using a different method, e.g., PM6-D3H4 with COSMO. 

IMPORTANT:   If the fault is of type (C), incorrect hydrogenation, than edit the optimized geometry to correct the fault, and re-run the geometry optimization.  Then re-run the comparison to verify that the fault has been corrected. Incorrect hydrogenation most often occurs in substrates and hetero-groups such as heme rings. A lot of grief can be avoided by looking at differences in bond-lengths and checking for hydrogenation errors.

Other global data such as the date, heats of formation, number of atoms, etc., are printed in the output and in the HTML web-page.