Modeling proteins and Enzyme-Catalyzed Reactions (Back to "Proteins")

Introduction

For the purposes of investigating protein behavior, it is useful to regard the program MOPAC as a complete laboratory, with all the utilities and tools needed for modeling protein behavior. As with tools in any laboratory, a certain level of skill is needed; a potential user who just wants to look at an enzyme-catalyzed reaction and is not willing to prepare a suitable model will inevitably be disappointed. Potential users must be willing to do two things: first, to acquire the skills necessary for manipulating the complicated models, and second, to construct and use such models. These are difficult tasks, and a lot of patience is required. To help with them, MOPAC provides a set of utilities to assist in building models and for detecting errors that occur during this process.

Any researcher who wants to use MOPAC for modeling proteins and is willing to invest the effort to learn how to do that has a right to expect the program to work as described in this Manual. Obviously there are faults in MOPAC, but a serious effort has been made, taking more than a year, to check that the program works correctly, and to simplify identifying and correcting faults in protein data-sets. The current on-line manual and MOPAC program are the culmination of that effort.

Why should an experimentalist use computational chemistry modeling tools? Probably the most important reason is that they provide an alternative to the other two approaches: experiment and understanding. To give a simple example, consider an X-ray structure of a protein. This structure is physics, not chemistry, and almost all (well over 90%) of the structures examined thus far have had mild to severe errors when looked at from a chemical point of view. Examples would be unrealistic single bond lengths, e.g., C-C bonds of length ~1.1 Å instead of the expected 1.5 Å, unrealistic non-covalently bonded distances, and simple errors such as a C-NH₂ being mistaken for a C=O and vice versa. These errors can be detected using MOPAC, and, more important, they can be corrected. The result is a structure that is much more chemically realistic than the starting PDB structure. Put another way, the result of simply building a model of a protein is a structure that is more realistic, i.e., nearer to the structure of the biochemical that occurs in nature, than anything available from any other source.

To reiterate, for a model to be useful it must be realistic. When an enzyme catalyzes a reaction, the energy changes involved are often very small, frequently less than the energy involved in making a single hydrogen bond. If there is a fault in the model that might cause it to be in a high-energy state, and later on in the simulation that fault corrected itself, the resulting energy change would more than likely invalidate all the work that had gone before. It is important, therefore, to make sure that the starting model is as error free as possible. Unfortunately, and without exception, all data sources for protein structures (including the most important one: the Protein Data Bank) have limitations. These range from minor faults - ones that were previously regarded as unimportant - such as missing hydrogen atoms, to quite severe geometric errors, such as carbon-carbon single bonds having a bond-length of less than 1.4 Ångstroms. Therefore, before any work can be done, a realistic model of the protein must be constructed. This operation involves several steps, and requires great care to be exercised to ensure that the resulting model is as good as possible.

MOPAC contains many Tools for use with Proteins.

Steps and processes involved in modeling Proteins

Installing MOPAC2016: How to get the program, install, and activate it
Recommended folder names: Suggestion to simplify navigating through a project
Running a simple job: Constructing a data set for formaldehyde, running it, analyzing the results. Some nomenclature (jobs, calculations, etc.)
Getting a starting protein structure: The PDB, what to look for and what to watch out for
Graphical User Interfaces: What to look for
Preparing a starting data set: Adding hydrogen atoms
Solvation: Use solvation - it make the model more realistic
Resequencing: What to watch out for
Determining Ionized Sites and Salt bridges: Correct placement of hydrogen atoms
Running a 1SCF calculation: Issues involved in

Check that the Starting Model is valid: Examine the active site

Correcting errors in the X-ray structure: Improving on the X-ray structure

Unconstrained optimization: Generating the starting point for modeling protein chemistry

Compare the geometries of two systems

Editing the Starting Model to make small changes: Changing -COOH to -CONH2 or CH2OH to CH3

Choosing a format: MOPAC or PDB: Issues and considerations

Make a backup copy of the starting model: Avoid the risk of losing a lot of hard work

Improving the accuracy of relative ligand - protein binding energies: Steps to improve the accuracy of the relative binding energy of ligands non-covalently bound to a protein.

Worked example: Chymotrypsin: Complete catalytic cycle

Constructing a Chymotrypsin Starting Model: The minimum energy structure is best

Making reactants and products: Techniques for generating intermediates

Locating and Refining Transition States in Proteins: Generating transition state geometries

Verifying transition states: Show that there is exactly one imaginary force constant

Intrinsic Reaction Coordinates: Show that the imaginary mode connects reactants and products

The Globule model: A method of modeling enzymes that runs faster