LOCATE-TS, LOCATE-TS(C:[n.nn[,n.nn[,n.nn...]]][;Set:m])

Description

The objective of LOCATE-TS is to locate and refine a transition state joining two stationary points in an enzyme-catalyzed reaction.  These points are more commonly referred to as reactants and products, but for convenience in this description these geometries will be referred to as A and BA and B are used because during the calculation the geometries will be modified as they move towards each other, and would therefore no longer be stationary points.  In LOCATE-TS, the geometry optimization is performed on both A and B simultaneously. 

The function being optimized is:

 ΔHf' = ΔHfA + ΔHfB +cΣi(XAi - XBi)2

where ΔHfA and ΔHfB  are the calculated heats of formation of A and B, respectively. "c" is a constant, in kcal/mol/Ångstrom2, and XAi and XBi are the coordinates of atom "i" in the A and B, respectively. 

Advice on using LOCATE-TS

Both geometries used must be as good as possible: They must be stationary points on the Potential Energy Surface (PES); they should include PDB data, either by using a PDB file or, more commonly, by using the normal MOPAC data-set format, and having the atoms labeled with PDB data.

Have all the files involved in the same folder.  This will make the job of defining GEO_DAT and GEO_REF easier.

Because of the large probability of introducing errors into the data sets, instead of preparing specific data-sets it is easier to define the geometries to be used using GEO_DAT and GEO_REF, and having these keywords point to ARC files.  This results in a very small data set.  For example, if the reactant geometry is in Step_1.arc and the product geometry is in Step_2.arc, the data set would be as follows:

Example of a complete data set for LOCATE-TS

LOCATE-TS GEO_DAT="Step_1.arc" GEO_REF="Step_2.arc" EPS=78.4

EPS is specified here, because using implicit solvation gives a more realistic model.   Keyword  LOCATE-TS causes the MOZYME function to be used, so there is no need to add keyword MOZYME, if it is supplied, it will be ignored.

Possible problem

If the job does not produce any output, then check that the keyword GEO_DAT is present.  If it's not present, the data-set will look like it's too short to be a real MOPAC data-set.

Worked exercise in locating a transition state.

A complete enzyme catalyzed mechanism for the hydrolysis of a peptide bond is given in Chymotrypsin Mechanism.  The worked exercise involves determining the transition state for the first reaction step.  Except for Step 3 => Step 4, the other steps are similar.

Download Step_1.arc and Step_2.arc and store them in a new folder.  In the same folder, create a text file, Step_1_2_Transition_State.mop.  Edit this file to add the text in the above example.  Run the job using MOPAC.  It should run for about one day.  If an ARC file is generated, the test was successful.  If it was not generated, please contact me at MrMOPAC@ATT.net, and send me the data set and output files. 

How  LOCATE-TS works

The process that LOCATE-TS uses is as follows: The reference data set geometry, defined using GEO_REF, is rotated and translated to put it into the best overlap with the geometry from the input data set.  At this point, the two geometries are likely to be quite different.  This difference can be expressed as a distance, defined as the square root of the sum in the above equation.  Typical distances in enzyme chemistry are in the order of 50 - 300Å.  Geometry optimization is then started, using a small value for "c," by default this is 3 kcal/mol/Ångstrom2.  The first step consists of solving the SCF equations using the MOZYME technique.  (There is no need to specify MOZYME, the presence of LOCATE-TS implies MOZYME.)  All subsequent steps use the wavefunction from this initial SCF calculation, and the pull exerted by the "c" term. This causes the two geometries to move towards each other without imposing a large stress. During this process, the distance will typically drop by a large amount - the two geometries can be regarded as moving across an almost level plane until they stop near the bottom of the activation barrier.  The value of "c" is then increased; the new default value is 30 kcal/mol/Ångstrom2.  This large pull then moves A and B up the barrier to near to the transition state.  As with the previous step, the first point involves solving the SCF equations, and all subsequent points use the frozen wavefunction.  This optimization is repeated a few times using the same value of "c" to ensure that the wavefunction is sufficiently relaxed.  When the optimization is finished the geometries of A and B are almost the same, with one on each side of the transition state.

An estimate of the transition state geometry, C,  is obtained by averaging the two geometries.  The next few steps involve a pair of operations, each pair being repeated a small number of times, typically two to five times.  The first of this pair of operations involves geometry optimization of C, while holding fixed all atoms involved in bond making or bond breaking, i.e. optimizing the positions of all atoms except those in the active site.  The second step involves transition state location, i.e., gradient minimization, but this time using only the atoms in the active site. 

Options for LOCATE-TS

LOCATE-TS

LOCATE-TS can be used on its own, i.e., without any of the terms in the square brackets; if that is done, then the default optimization procedure is used, and output is small.  The default optimization can be reproduced using LOCATE-TS(C:3,30,30,30;SET:1)

About half the time the default LOCATE-TS does not finish correctly. Almost always the first big step, moving the reactants and products up the reaction barrier, runs correctly, and most of the time when failures occur they occur in the refinement of the transition state. Because of this behavior, at the end of the first big step three data sets are generated that can then be used in attempting to refine the transition stae.  These data sets have the names <name>_30p0_first.mop<name>_30p0_second.mop, and  <name>_30p0_average.mop.   <name>_30p0_first.mop is the final geometry generated from the original data set (the reactant) or by the file defined by GEO_DAT<name>_30p0_second.mop is the final geometry from GEO_REF. <name>_30p0_first.mop and <name>_30p0_second.mop are thus the geometries on each side of the transition state, near the top of the reaction barrier.  An approximation to the transition state geometry is the average of these two structures, this is given in <name>_30p0_average.mop.  The final value of "c" used is indicated by the text "30p0"  This specific case represents a value of "c" of 30.0 kcal/mol/Ångstrom2.

LOCATE-TS(C:30,30,30,40,40;SET:1)

If LOCATE-TS does not finish correctly, examine the structures on each side of the transition state.  If these look correct, then try to refine them by using this option.  Do not start with a constraint lower than the last value used in the previous run - if the constraint is lowered, the two systems will move away from the transition state.  Using options of the type shown here increases the possibility that the refinement would work correctly because the refinement would be started with a geometry that was nearer to the transition state.

If the structures on each side of the transition state do not look correct, then start over with a more cautious set of constraints.  An example of such a cautious set would be LOCATE-TS(C:1,3,10,20,25,25;SET:1) This starts the procedure with a small penalty function, 1.0 kcal/mol/Ångstrom2 followed, in order, by increasing biases, until a penalty of 25.0 kcal/mol/Ångstrom2 is used. The transition state would then be refined using Set 1, i.e., all atoms involved in bond making or bond breaking.

When this form of the keyword is used, the output will be larger, and intermediate files generated; these files can then be used in locating the transition state manually.  The number of constraints used can be decreased to zero or increased up to 20.  Two main options for the size of the active site are provided. SET:1 consists of only those atoms involved in bond making or bond breaking and SET:2 which consists of SET:1 plus nearest neighbors.

LOCATE-TS(C:;SET:1)

The two data sets used by  LOCATE-TS are passed directly, i.e., without modification, to the transition state refinement operation. If the two geometries are near to the transition state, there is an increased probability that the process for recognizing bond-breaking and bond making operation will not work correctly.  To avoid problems with this operation, use the next option, vis LOCATE-TS(SET:1).

LOCATE-TS(SET:1)

This option uses only one data-set, usually the one generated by an earlier run, e.g., <name>_30p0_average.mop.  Carefully define the atoms involved in bond-breaking and bond-making by setting their optimization flags set to "1", all other optimization flags being set to zero. When this option is used the option for "SET:2" is meaningless.

   LOCATE-TS was developed and optimized for use with enzymes.  It can be used for other species, but the probability of success is lower.

Complete worked example of LOCATE-TS

LOCATE-TS was used to locate transition states in the chymotrypsin mechanism.  To reproduce this mechanism, download the ARC files for the intermediates (Step 1 to Step 6), and use the LOCATE-TS keyword in the transition state stationary point arc files.