Use of MKL and Multi-Threading to reduce computation time

This description is based on materials provided by Prof Gerd B. Rocha, Universidade Federal da Paraíba, João Pessoa, Brazil.

The modifications in the semiempirical quantum chemistry MOPAC2016 code that accelerate single-point energy calculations (1SCF) of medium-size (up to 4000 atoms) molecular systems using multithreaded shared-memory CPUs are now presented.

Computational aspects

The modifications consisted of using a combination of highly optimized linear algebra libraries (LAPACK and BLAS from Intel MKL (Math Kernel Library)) to reduce the computation time for parts of MOPAC such as the pseudodiagonalization, full diagonalization, and density matrix assembling.

The SCF procedure in MOPAC is driven by the ITER subroutine which is called exclusively by the COMPFG subroutine. The most time-consuming steps in conventional semi-empirical calculations are the Fock matrix diagonalization and density matrix assembly, since these are O(N³) procedures, N being the number of basis functions. The rest of the procedures carried out in the SCF have N² complexity, such as the assembly of Fock Matrix. So, most of our efforts were to replace these subroutine calls for more-optimized similar versions from MKL.

The pseudodiagonalization in MOPAC is carried out in subroutine DIAG, and its algorithm starts by forming the occupied-virtual block of the Fock matrix, F_o−v, in the molecular orbital basis of the previous iteration:

F_o-v = C_o^TFC_v

where C_o is the occupied orbitals sub-matrix and C_v is the same for the virtual ones. Two matrix multiplications are necessary for the assembly of F_o−v in this first part.

Once F_o−v is built, the second part of DIAG performs a series of 2x2 vector rotations (unitary transformations) which approximately eliminates the elements of F_o−v. Let i be indexes for the occupied orbitals and α be indexes for the virtual ones. The expressions to calculate the new eigenvectors (C_i^new and C_a^new) from the old ones (C_i and C_α) by applying successive rotations of 2x2 vectors, and these are given by the following formulae:

C_i^new = cC_i- sC_aand C_a^new = sC_i+ cC_a

where

s = F_ia/(

ε_i - ε_a ) and c = (1 - s²)^½

In these equations, ε_i and ε_α are the molecular orbital energies for the last conventional SCF iteration (using a full diagonalization procedure). This approach significantly reduces the SCF procedure, making it faster.

To hasten it, we have replaced both matrix multiplications for more-optimized MKL procedures (DGEMM), as well as the Jacobi rotations step. Also, we have replaced the full diagonalization procedure subroutine (using subroutines from EISPACK 3 library) with Intel MKL DSYEVD. At the density matrix assembly step, we have used DSYRK instead of DGEMM, since the product matrix is symmetric.

Benchmarks

Below, we present times, in seconds, for a single-point calculation run on a 2 x 2.93 GHz 6-Core Intel Xeon with 16Gb of 1333 DDR3 memory chips Mac Pro computer, using 12 of the 24 threads

Species	#atoms	Current MOPAC2016	MOPAC with MKL	MOPAC2016 with MKL and multi-threading
Naphthalene	18	0.02	0.15	0.02
Crambin	642	468	114	12
(H ₂O)₅₇₃	1719	3,082	584	79
1G6X	1455	8,612	1,240	142
1EZG (Thermal Hysteresis Protein)	2064	22,959	2,118	300
1RNB (Barnase)	2066	34,372	4,108	411
Bacteriorhodopsin (1BRX)	3352	141,773	11,192	1,394

Additional details

Multi-threading does not speed up MOZYME calculations because they use sparse matrices.

In MOPAC, proteins can be modeled using the MOZYME technique, however, that technique is limited to closed shell RHF calculations. This means that proteins with free radical sites, excited state proteins, and proteins containing iron, chromium or other transition metal atoms should not be modeled using MOZYME.

Such modifications reduce the computation time so that conventional MOPAC methods can now be used for many proteins. This means that it is now practical to use conventional MOPAC methods - RHF-CI and UHF - for modeling most of the smaller proteins, and by implication that it will be possible to model many systems that currently cannot or should not be modeled using MOZYME.

All technical details of the modifications to the MOPAC2016 code, as well as some results, benchmarks and an application in biomolecules, can be found in the publication:

Maia, J. D. C.; Urquiza Carvalho, G. A.; Mangueira, C. P.; Santana, S. R.; Cabral, L. A. F.; Rocha, G. B., "GPU Linear Algebra Libraries and GPGPU Programming for Accelerating MOPAC Semiempirical Quantum Chemistry Calculations" Journal of Chemical Theory and Computation 2012, 8, 3072Â–3081. See: http://pubs.acs.org/doi/abs/10.1021/ct3004645

In this publication you also can find all about the GPU-powered MOPAC that was released in January 2014. For details of the effect of using a GPU, see GPU.