ADD-H

Adds hydrogen atoms to a data set to satisfy valence requirements.  All sites are neutralized, but salt bridges can form spontaneously when the positions of hydrogen atoms are neutralized.  The results are suitable for use in preparing a MOPAC data set.  This keyword is intended for use with proteins only.  It should work for non-proteins as well, but there is an increased probability that it will make mistakes, and for systems that cannot exist in aqueous media, such as LiCH3, there is a high probability of failure. In a run to add hydrogen atoms, one or more sites can be ionized by using SITE.

If any hydrogen atoms are present, they will be removed before ADD-H runs.   When a PDB file is made, a set of checks is run to detect errors in the structure.  Examine the log file, <name>.log, to look for any errors that were detected, and if any are reported, the input, output, or PDB file created should be examined to work out what has happened. By default, the sequence of atoms will be put into the standard PDB sequence.  If this is not wanted, add NORESEQNORESEQ is also useful in the unlikely event that the ADD-H run reports an error in the residue recognition process.

Whether specific sites should be ionized or not is hard to answer.  For simplicity, ADD-H produces the completely neutral protein, or, if definite ions such as Ca2+ or K+ are present, the minimally ionized form.

Sometimes, ADD-H makes mistakes.  For example, both guanine and pyrimidine-2-one (see picture) have similar environments for the top left ring nitrogen atoms, but in guanine there is a hydrogen atom attached, while in pyrimidine-2-one that hydrogen atom is missing.  This is a consequence of the positions of the two double bonds in the six-membered ring. That is, it is not a function of the nitrogen atom, nor of the carbon atom adjacent, but depends on the more distant atoms.  ADD-H makes mistakes with complicated structures like these. 

Detecting faults in hydrogenation

A simple way to find obvious faults in hydrogenation is to run a single SCF calculation and look at the forces acting on the hydrogen atoms.  If any forces on a hydrogen atom are large, over 100 kcal·mol-1·Å-1, then examine it using a GUI (JSmol is ideal here)  Suggested keywords: GEO_DAT="test.arc" NOOPT OPT-H 1SCF GRADIENTS HTML MOZYME EPS=78.4. 

Most likely a reason will be found for all the large gradients.  A hydrogen - hydrogen distance might be too small, or a hydrogen might be near to, but not at, the position expected.  Unless there is something definitely wrong, ignore faults of this type - they will automatically be corrected by the geometry optimization.   Possibly one or more simple errors will be found where there are more or less hydrogen atoms attached to a heavy atom than expected.  Correct these by using the SITE keyword and re-running the hydrogenation, as in "ADD-H SITE=(...)"

Also look at the charged sites printed in the output of this run.  Confirm that all the ionized sites are reasonable.

Recommended usage

Starting with an un-modified PDB file, e.g. 1A1A.pdb.  Edit a MOPAC data-set file to be named "1A1A ADD-H.mop"  Use keywords "GEO_DAT="1A1A.pdb" SITE=(SALT) ADD-H NOOPT OPT-H HTML 1SCF GRADIENTS MOZYME EPS=78.4" and a title-line describing the system, e.g., "PDB file 1A1A with hydrogen atoms added, but positions not optimized"  If you know that sites in addition to salt-bridges are ionized, add these ionizations to the SITE command.

Run the data-set. Look at the output and the HTML file generated.  If any errors are found, edit the original data-set and re-run it.

Re-name the ARC file as, e.g., "1A1A 1SCF.mop" and run that using MOPAC.  Examine the output to see if the forces on any atoms are very large.  If they are, look at the environment of the atoms concerned.  If the hydrogen atom needs to be moved, edit the SITE command and start over.

When everything looks good, edit the ARC file to make "1A1A H-OPT.mop" by deleting the keyword 1SCF, then run "1A1A H-OPT.mop". If the system is well-behaved (it is the system you want to run, and there are no unusual features) an alternative would be to edit the ARC file to use the keywords "OPT HTML MOZYME CHARGE=n EPS=78.4" to make "1A1A Opt.mop". This would allow the global geometry optimization to be started.

See also Modeling ProteinsSITE,  PDBOUT, and  NORESEQ