(Modeling proteins)

Correcting errors in the X-ray structure

Of their nature, X-ray structures are physics, not chemistry.  A good X-ray structure will have a resolution of ~1Å.  X-ray structures are very good for secondary (alpha helices, beta sheets, etc.) and tertiary (folding and packing of alpha helices and beta sheets, etc.) structures, but are not good in predicting bond lengths and angles.  On the other hand, semiempirical methods are good at predicting chemical structures, but are of lower accuracy in predicting secondary and tertiary structures.  By combining the two techniques, structures that are of unprecedented accuracy can be generated.

Once the positions of the hydrogen atoms have been optimized and the structure re-checked, errors in the positions of the heavy atoms can be corrected by a constrained geometry optimization.  The constraint is that the optimized structure is biased towards the starting X-ray structure.  This bias is provided by the keyword GEO_REF="SELF"  This provides a bias of 3 kcal/mol per Ångstrom squared in favor of the X-ray structure.  This has little effect on bond lengths, and angles, but has a large effect on tertiary structures, particularly when the inter-chain forces are weak.

In some cases, the PDB structure errors are very large.  When severe errors are found, correct them before proceeding.

To set up a run, edit the archive file from the optimization of the positions of the hydrogen atoms, to give a new data set, call this <name_Geo_Ref>.mop  In addition to any keywords that are specific to the system, e.g., START_RES=(text) and CHAINS=(TEXT), add the keywords GNORM=20 MOZYME PL PDBOUT, and GEO_REF="SELF"

Start the job running.  Monitor it for the first two or three cycles, and if all is well, let it run unattended.  Geometry optimization of a protein is a time consuming operation; the job will run for a day up to three weeks, depending on the size of the system.  The default maximum run time is two days.  If it runs out of time and stops, then edit the data set to add the keywords T=2w and RESTART, then re-run the data set.  This will re-start at the geometry where the previous run ended, and will re-calculate the SCF; this eliminates any accumulated errors in the SCF from the previous run.  Then let it run again.  If it still needs more time, examine the output to verify that the system is optimizing properly.  If it isn't, then looking for errors in the data set.

Once the constrained optimization is complete, compare the starting and final structures, looking for changes.  Verify that the changes are improvements (they should all be improvements).  In particular, look at possible errors in the original PDB structure, and see if they have been corrected.