NEWPDB

Generates the modern PDB format for hydrogen atoms; by default the old format is generated.  With one exception only hydrogen atoms are affected; in Arginine, and only if necessary, the nitrogen atoms of the guanidine group are re-numbered.  The format for the rest of atoms must come either from the original PDB label or be generated by MOPAC.  If MOPAC generates the format, much of the fine detail is lost, so for example in phenylalanine the labels for Cδ1 and Cδ2 might be swapped round.

NEWPDB was added because some programs, e.g., Molprobity, either will not accept hydrogen atoms labeled with the old format or will require existing hydrogen atoms to be removed and new ones added using the new format.  In this latter case all information about the original hydrogen atoms, including their existence, is lost. The rules used by NEWPDB are taken from http://publications.iupac.org/pac/pdf/1998/pdf/7001x0117.pdf, Fig. 1 on page 122, "For example, if tetrahedral carbon C has four substituents X, Y, Z, and Z' (with priority X > Y > Z = Z'; i.e., Z and Z are diastereotopic substituents designated provisionally as unprimed and primed), their numbering is derived as follows: if one sights down the X-C axis (with the X atom toward the viewer), the equivalent atoms, Z and Z', are designated Z2 and Z3, such that Y, Z2,and Z3 follow a clockwise orientation." and related directives.

Molprobity

Molprobity is widely used in validating proteins, particularly those reported in the Protein Data Bank.  There are two important quantities used in determining the quality of a calculated protein structure.  One is the Root-Mean-Square error between the calculated and observed (X-ray or NMR) structures.  Calculation of this quantity is straightforward and can be done in MOPAC using COMPARE.  The other is a measure of the close contacts between pairs of atom that would be expected to have little or no interaction.  Within Molprobity these contacts are called "clashes."  Its reports of clashes are accurate, detailed, and of course very useful.  However, because Molprobity is optimized for analyzing experimental rather than theoretically predicted structures, when calculated geometries are examined, an unexpected complication arises.

By default, Molprobity will add hydrogen atoms to a PDB geometry.  This is, of course, very useful when experimental geometries, which often don't include hydrogen atoms, are being studied.  If a PDB geometry does include hydrogen atoms, these are deleted, and the hydrogenation performed by Molprobity is used. At this point, two issues arise when calculated geometries are used: Molprobity does not add hydrogen atoms to water molecules, and it also does not protonate carboxylate groups.  Modeling protein geometries requires as realistic a model as possible, so in addition to needing proteins to be hydrogenated all water molecules must also be hydrogenated, and those carboxylate groups that should be neutral must also have a proton on one or other of the oxygen atoms.  If these hydrogen atoms are deleted, any complications that might be caused by their presence would go unreported.

Although the option exists in Molprobity to use the hydrogenation from the PDB file this option only works well if the hydrogen atoms are in the modern PDB format.  This requirement was the motivation for making the keyword NEWPDB .   If the old PDB format is used Molprobity might report severe errors that are, in fact, non-existent.

Example of the old and new format for an arginine residue

The eleventh datum on each line in the old format is used in the HTML file generated by keyword HTML to indicate the atomic partial charge.  Because this datum is not used by MOPAC, it is not read in, and therefore if NEWPDB is present this datum is set to zero by default. That is, NEWPDB is treated in the same way as SITE or ADD-H, or any other keyword that modifies the data-set.  Of course, if a subsequent calculation, e. g., 1SCF,  that generates partial atomic charges is present then the eleventh datum will reflect that.

Old format

  

Modern format

ATOM      1  N   ARG A   1      -0.939   4.862 -12.505  1.0  -6.04      PROT N 
ATOM      2  CA  ARG A   1      -2.167   5.062 -11.709  1.0  -1.52      PROT C 
ATOM      3  C   ARG A   1      -3.087   6.117 -12.344  1.0   2.90      PROT C 
ATOM      4  O   ARG A   1      -4.303   6.068 -12.241  1.0  -7.51      PROT O 
ATOM      5  CB  ARG A   1      -1.773   5.462 -10.276  1.0  -2.95      PROT C 
ATOM      6  CG  ARG A   1      -2.933   6.093  -9.489  1.0  -2.92      PROT C 
ATOM      7  CD  ARG A   1      -4.077   5.096  -9.225  1.0  -0.82      PROT C 
ATOM      8  NE  ARG A   1      -3.590   4.032  -8.311  1.0  -4.78      PROT N 
ATOM      9  CZ  ARG A   1      -4.014   2.740  -8.454  1.0   5.79      PROT C 
ATOM     10  NH1 ARG A   1      -5.155   2.420  -9.143  1.0  -5.54      PROT N 
ATOM     11  NH2 ARG A   1      -3.275   1.759  -7.861  1.0  -6.14      PROT N 
ATOM     12  H   ARG A   1      -0.057   5.086 -12.078  1.0   2.16      PROT H 
ATOM     13  HA  ARG A   1      -2.740   4.099 -11.722  1.0   1.24      PROT H 
ATOM     14 1HB  ARG A   1      -1.399   4.559  -9.748  1.0   0.90      PROT H 
ATOM     15 2HB  ARG A   1      -0.914   6.164 -10.292  1.0   1.54      PROT H 
ATOM     16 1HG  ARG A   1      -2.556   6.493  -8.524  1.0   1.27      PROT H 
ATOM     17 2HG  ARG A   1      -3.338   6.979 -10.016  1.0   2.06      PROT H 
ATOM     18 1HD  ARG A   1      -4.919   5.593  -8.696  1.0   1.52      PROT H 
ATOM     19 2HD  ARG A   1      -4.456   4.680 -10.177  1.0   1.98      PROT H 
ATOM     20  HE  ARG A   1      -2.658   4.194  -7.857  1.0   3.40      PROT H 
ATOM     21 1HH1 ARG A   1      -5.848   3.138  -9.310  1.0   3.45      PROT H 
ATOM     22 2HH1 ARG A   1      -5.485   1.476  -9.138  1.0   3.23      PROT H 
ATOM     23 1HH2 ARG A   1      -2.300   1.938  -7.583  1.0   3.48      PROT H 
ATOM     24 2HH2 ARG A   1      -3.670   0.865  -7.695  1.0   3.28      PROT H 
 
ATOM      1  N   ARG A   1      -0.939   4.862 -12.505  1.0   0.00      PROT N 
ATOM      2  CA  ARG A   1      -2.167   5.062 -11.709  1.0   0.00      PROT C 
ATOM      3  C   ARG A   1      -3.087   6.117 -12.344  1.0   0.00      PROT C 
ATOM      4  O   ARG A   1      -4.303   6.068 -12.241  1.0   0.00      PROT O 
ATOM      5  CB  ARG A   1      -1.773   5.462 -10.276  1.0   0.00      PROT C 
ATOM      6  CG  ARG A   1      -2.933   6.093  -9.489  1.0   0.00      PROT C 
ATOM      7  CD  ARG A   1      -4.077   5.096  -9.225  1.0   0.00      PROT C 
ATOM      8  NE  ARG A   1      -3.590   4.032  -8.311  1.0   0.00      PROT N 
ATOM      9  CZ  ARG A   1      -4.014   2.740  -8.454  1.0   0.00      PROT C 
ATOM     10  NH1 ARG A   1      -5.155   2.420  -9.143  1.0   0.00      PROT N 
ATOM     11  NH2 ARG A   1      -3.275   1.759  -7.861  1.0   0.00      PROT N 
ATOM     12  H   ARG A   1      -0.057   5.086 -12.078  1.0   0.00      PROT H 
ATOM     13  HA  ARG A   1      -2.740   4.099 -11.722  1.0   0.00      PROT H 
ATOM     14  HB2 ARG A   1      -1.399   4.559  -9.748  1.0   0.00      PROT H 
ATOM     15  HB3 ARG A   1      -0.914   6.164 -10.292  1.0   0.00      PROT H 
ATOM     16  HG2 ARG A   1      -2.556   6.493  -8.524  1.0   0.00      PROT H 
ATOM     17  HG3 ARG A   1      -3.338   6.979 -10.016  1.0   0.00      PROT H 
ATOM     18  HD2 ARG A   1      -4.919   5.593  -8.696  1.0   0.00      PROT H 
ATOM     19  HD3 ARG A   1      -4.456   4.680 -10.177  1.0   0.00      PROT H 
ATOM     20  HE  ARG A   1      -2.658   4.194  -7.857  1.0   0.00      PROT H 
ATOM     21 HH11 ARG A   1      -5.848   3.138  -9.310  1.0   0.00      PROT H 
ATOM     22 HH12 ARG A   1      -5.485   1.476  -9.138  1.0   0.00      PROT H 
ATOM     23 HH21 ARG A   1      -2.300   1.938  -7.583  1.0   0.00      PROT H 
ATOM     24 HH22 ARG A   1      -3.670   0.865  -7.695  1.0   0.00      PROT H