Improving the accuracy of calculation of the relative binding-energy of a ligand
non-covalently attached to a protein (Back to "Proteins")


The Problem

Computing the relative binding energy of ligands non-covalently bound to proteins requires achieving an unusually high accuracy.  This requirement is essential when calculating the binding ability of various ligands, in particular candidate therapeutic drugs.  If the modifications described below are made then an estimate of the accuracy in computed binding energies that can be expected is as follows:

  1. Errors in the interaction energies between the ligand and the protein.  Over a range of types of interaction energies the Average Unsigned Error relative to very high-quality calculations is 1.72 kcal mol-1 for PM6-D3H4 and 2.91 kcal mol-1 for PM7. If the set S12L, a severe test of non-covalent intereactions, is removed, the AUE's drop to 1.04 kcal mol-1 for PM6-D3H4 and to 1.34 kcal mol-1 for PM7.

  2. Incomplete geometry optimization would cause errors of about 1 kcal mol-1.

  3. Errors in the ΔHf of the ligand and protein separated in solution might not cancel out.

Using these data an estimate of the average unsigned error in the relative binding energy of the ligand non-covalently bound to a protein is about 2 to 3 kcal mol-1, provided, and this is a big qualification, the errors in 3 are small. Errors in the ΔHf of the protein obviously cancel out, but errors in the ΔHf of the ligand in solution are more problematic.  Specifically, the known error in PM6 that under-estimates steric repulsion between non-covalently bound atoms can compromise the ΔHf of the ligand in solution.

The problems with conventional calculation options are as follows:

The systems are large and multiple minima exist

If an entire protein is used then almost always a severe problem is encountered in the geometry optimization that prevents the necessary high accuracy in energies from being achieved.  This problem is a direct consequence of the very large number of local minima on the potential energy surface (PES) and is common to all complicated systems such as proteins; it has nothing to do with the optimization procedure or with the underlying method.  When a geometry is optimized the procedure used moves the atomic coordinates in a downhill direction on the PES. This process terminates when no further decrease in heat of formation (ΔHf) can be achieved. Minor changes in the starting geometry can give rise to a different path being followed, resulting in a different minima being reached. These minima can differ in ΔHf by several kcal mol-1, and, although these minima might look correct, any conclusions drawn from the various ΔHf would almost certainly be untrustworthy.

Both PM6 and PM7 methods and their modifications have errors

PM6 should never be used on its own because, it severely underestimates dispersion effects and hydrogen bonds and as a result it is essentially useless for predicting non-covalent interactions, Instead always use PM6-D3H4 or PM6-D3H4X.   Having said that, the underlying PM6 method has two known errors, involving spurious sulfur - oxygen and sulfur - nitrogen interactions, see details.

PM7 has one severe error, a spurious chlorine - nitrogen non-covalent interaction. This interaction results in an excess stabilization of about 11 kcal mol-1 in chlorine-containing ligands non-covalently bound to proteins.

Incomplete geometry optimization results in large errors

Although the default geometry optimizer for large systems, the L-BFGS optimizer, is the most efficient optimizer in MOPAC, at the end of an optimization this procedure has difficulty locating the energy minimum, and it often terminates several kcal mol-1 above the lowest point.

Suggested modifications

Multiple minima

This procedure is only recommended for proteins that do not exhibit allosteric behavior.

Two steps are recommended.  In the first step protein-ligand docking software, e.g., GOLD, should be used to generate a number of candidate conformations of a ligand docked into a protein. In the second step all of the protein far from the ligand is deleted from the system being modeled. This reduces the size of the system to only the region of interest, this region will be referred to as a Globule. When a Globule is used most of the minima are automatically excluded. 

Errors in methods

PM6-D3H4 errors: Unless any sulfur atoms are near to the site of interest and when the globule geometry is optimized a severe distortion occurs, this error is unimportant and can be ignored.  If corrective action is needed, then the appropriate interatomic separation should be fixed, i.e., not allowed to optimize. Alternatively, and more easily implemented, use PM7 instead of PM6-D3H4.

PM7 error: If any chlorine atoms are involved in the ligand - protein interaction, don't use PM7; use PM6-D3H4 instead.

Incomplete geometry optimization

This fault can be completely eliminated by using LET(nnn), where nnn is large.  A good value for nnn is 250. This significantly increases the time required for the optimization, but the results are considerably more reliable.


Working with ligands involves large numbers of jobs, typically the number of ligands times the number of poses from the docking program, so to minimize the possibility of making a typographic error as many steps in preparing and running a job as possible should be automated.

Suggestions for running surveys that use a Globule: