This description is based on materials provided by Prof Gerd B. Rocha, Universidade Federal da Paraíba, João Pessoa, Brazil.
The modifications in the semiempirical quantum chemistry MOPAC2016 code that accelerate single-point energy calculations (1SCF) of medium-size (up to 4000 atoms) molecular systems using multithreaded shared-memory CPUs are now presented.
Computational aspects
The modifications consisted of using a combination of highly optimized linear algebra libraries (LAPACK and BLAS from Intel MKL (Math Kernel Library)) to reduce the computation time for parts of MOPAC such as the pseudodiagonalization, full diagonalization, and density matrix assembling.
The SCF procedure in MOPAC is driven by the ITER subroutine which is called exclusively by the COMPFG subroutine. The most time-consuming steps in conventional semi-empirical calculations are the Fock matrix diagonalization and density matrix assembly, since these are O(N³) procedures, N being the number of basis functions. The rest of the procedures carried out in the SCF have N² complexity, such as the assembly of Fock Matrix. So, most of our efforts were to replace these subroutine calls for more-optimized similar versions from MKL.
The pseudodiagonalization in MOPAC is carried out in subroutine DIAG, and its algorithm starts by forming the occupied-virtual block of the Fock matrix, Fo−v, in the molecular orbital basis of the previous iteration:
Fo-v = CoTFCv
where Co is the occupied orbitals sub-matrix and Cv is the same for the virtual ones. Two matrix multiplications are necessary for the assembly of Fo−v in this first part.
Once Fo−v is built, the second part of DIAG performs a series of 2x2 vector rotations (unitary transformations) which approximately eliminates the elements of Fo−v. Let i be indexes for the occupied orbitals and α be indexes for the virtual ones. The expressions to calculate the new eigenvectors (Cinew and Canew) from the old ones (Ci and Cα) by applying successive rotations of 2x2 vectors, and these are given by the following formulae:Cinew = cCi - sCa and Canew = sCi + cCa
where
s = Fia/(
εi - εa ) and c = (1 - s2)½ In these equations, εi and εα are the molecular orbital energies for the last conventional SCF iteration (using a full diagonalization procedure). This approach significantly reduces the SCF procedure, making it faster.To hasten it, we have replaced both matrix multiplications for more-optimized MKL procedures (DGEMM), as well as the Jacobi rotations step. Also, we have replaced the full diagonalization procedure subroutine (using subroutines from EISPACK 3 library) with Intel MKL DSYEVD. At the density matrix assembly step, we have used DSYRK instead of DGEMM, since the product matrix is symmetric.
Benchmarks
Below, we present times, in seconds, for a single-point calculation run on a 2 x 2.93 GHz 6-Core Intel Xeon with 16Gb of 1333 DDR3 memory chips Mac Pro computer, using 12 of the 24 threads
Species |
#atoms |
Current MOPAC2016 |
MOPAC with MKL |
MOPAC2016 with MKL and multi-threading |
Naphthalene |
18 |
0.02 |
0.15 |
0.02 |
Crambin | 642 | 468 | 114 | 12 |
(H 2O)573 |
1719 |
3,082 |
584 |
79 |
1G6X |
1455 |
8,612 |
1,240 |
142 |
1EZG (Thermal Hysteresis Protein) | 2064 | 22,959 | 2,118 | 300 |
1RNB (Barnase) |
2066 |
34,372 |
4,108 |
411 |
Bacteriorhodopsin (1BRX) | 3352 | 141,773 | 11,192 | 1,394 |
Additional details
Multi-threading does not speed up MOZYME calculations because they use sparse matrices.
In MOPAC, proteins can be modeled using the MOZYME technique, however, that technique is limited to closed shell RHF calculations. This means that proteins with free radical sites, excited state proteins, and proteins containing iron, chromium or other transition metal atoms should not be modeled using MOZYME.
Such modifications reduce the computation time so that conventional MOPAC methods can now be used for many proteins. This means that it is now practical to use conventional MOPAC methods - RHF-CI and UHF - for modeling most of the smaller proteins, and by implication that it will be possible to model many systems that currently cannot or should not be modeled using MOZYME.
All technical details of the modifications to the MOPAC2016 code, as well as some results, benchmarks and an application in biomolecules, can be found in the publication:
Maia, J. D. C.; Urquiza Carvalho, G. A.; Mangueira, C. P.; Santana, S. R.; Cabral, L. A. F.; Rocha, G. B., "GPU Linear Algebra Libraries and GPGPU Programming for Accelerating MOPAC Semiempirical Quantum Chemistry Calculations" Journal of Chemical Theory and Computation 2012, 8, 30723081. See: http://pubs.acs.org/doi/abs/10.1021/ct3004645
In this publication you also can find all about the GPU-powered MOPAC that was released in January 2014. For details of the effect of using a GPU, see GPU.