RESUMO
Electronic structure theory (especially quantum chemistry) has thrived and has become increasingly relevant to a broad spectrum of scientific endeavors as the sophistication of both computer architectures and software engineering has advanced. This article provides a brief history of advances in both hardware and software, from the early days of IBM mainframes to the current emphasis on accelerators and modern programming practices.
RESUMO
A discussion of many of the recently implemented features of GAMESS (General Atomic and Molecular Electronic Structure System) and LibCChem (the C++ CPU/GPU library associated with GAMESS) is presented. These features include fragmentation methods such as the fragment molecular orbital, effective fragment potential and effective fragment molecular orbital methods, hybrid MPI/OpenMP approaches to Hartree-Fock, and resolution of the identity second order perturbation theory. Many new coupled cluster theory methods have been implemented in GAMESS, as have multiple levels of density functional/tight binding theory. The role of accelerators, especially graphical processing units, is discussed in the context of the new features of LibCChem, as it is the associated problem of power consumption as the power of computers increases dramatically. The process by which a complex program suite such as GAMESS is maintained and developed is considered. Future developments are briefly summarized.
RESUMO
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine.
Assuntos
Linguagens de ProgramaçãoRESUMO
The use of gallium for cleaning hydrogen-contaminated Al2O3 surfaces is explored by performing first principles density functional calculations of gallium adsorption on a hydrogen-contaminated Al-terminated α-Al2O3(0001) surface. Both physisorbed and chemisorbed H-contaminated α-Al2O3(0001) surfaces with one monolayer (ML) gallium coverage are investigated. The thermodynamics of gallium cleaning are considered for a variety of different asymptotic products, and are found to be favorable in all cases. Physisorbed H atoms have very weak interactions with the Al2O3 surface and can be removed easily by the Ga ML. Chemisorbed H atoms form stronger interactions with the surface Al atoms. Bonding energy analysis and departure simulations indicate, however, that chemisorbed H atoms can be effectively removed by the Ga ML.
RESUMO
The primary focus of GAMESS over the last 5 years has been the development of new high-performance codes that are able to take effective and efficient advantage of the most advanced computer architectures, both CPU and accelerators. These efforts include employing density fitting and fragmentation methods to reduce the high scaling of well-correlated (e.g., coupled-cluster) methods as well as developing novel codes that can take optimal advantage of graphical processing units and other modern accelerators. Because accurate wave functions can be very complex, an important new functionality in GAMESS is the quasi-atomic orbital analysis, an unbiased approach to the understanding of covalent bonds embedded in the wave function. Best practices for the maintenance and distribution of GAMESS are also discussed.
RESUMO
The simulation of nonlinear ultrasound propagation through tissue realistic media has a wide range of practical applications. However, this is a computationally difficult problem due to the large size of the computational domain compared to the acoustic wavelength. Here, the k-space pseudospectral method is used to reduce the number of grid points required per wavelength for accurate simulations. The model is based on coupled first-order acoustic equations valid for nonlinear wave propagation in heterogeneous media with power law absorption. These are derived from the equations of fluid mechanics and include a pressure-density relation that incorporates the effects of nonlinearity, power law absorption, and medium heterogeneities. The additional terms accounting for convective nonlinearity and power law absorption are expressed as spatial gradients making them efficient to numerically encode. The governing equations are then discretized using a k-space pseudospectral technique in which the spatial gradients are computed using the Fourier-collocation method. This increases the accuracy of the gradient calculation and thus relaxes the requirement for dense computational grids compared to conventional finite difference methods. The accuracy and utility of the developed model is demonstrated via several numerical experiments, including the 3D simulation of the beam pattern from a clinical ultrasound probe.
RESUMO
A novel implementation of the self-consistent field (SCF) procedure specifically designed for high-performance execution on multiple graphics processing units (GPUs) is presented. The algorithm offloads to GPUs the three major computational stages of the SCF, namely, the calculation of one-electron integrals, the calculation and digestion of electron repulsion integrals, and the diagonalization of the Fock matrix, including SCF acceleration via DIIS. Performance results for a variety of test molecules and basis sets show remarkable speedups with respect to the state-of-the-art parallel GAMESS CPU code and relative to other widely used GPU codes for both single and multi-GPU execution. The new code outperforms all existing multi-GPU implementations when using eight V100 GPUs, with speedups relative to Terachem ranging from 1.2× to 3.3× and speedups of up to 28× over QUICK on one GPU and 15× using eight GPUs. Strong scaling calculations show nearly ideal scalability up to 8 GPUs while retaining high parallel efficiency for up to 18 GPUs.
RESUMO
We present a high-performance, GPU (graphics processing unit)-accelerated algorithm for building the Fock matrix. The algorithm is designed for efficient calculations on large molecular systems and uses a novel dynamic load balancing scheme that maximizes the GPU throughput and avoids thread divergence that could occur due to integral screening. Additionally, the code adopts a novel ERI digestion algorithm that exploits all forms of permutational symmetry, combines efficiently the evaluation of both Coulomb and exchange terms together, and eliminates explicit thread synchronization requirements. Performance results obtained using a number of large molecules reveal remarkable speedups up to 24.4× with respect to the QUICK GPU code and up to 237× with respect to the GAMESS CPU parallel code.
RESUMO
The adsorption of Ga atoms in low coverage on the Al-terminated alpha-Al(2)O(3)(0001) surface has been studied theoretically by using first principles periodic boundary condition (PBC) calculations within the framework of the generalized gradient approximation (GGA). Eight possible adsorption sites are investigated, but only two are found to correspond to stationary points. Both of these locations are characterized as hollow sites, with three surrounding surface O atoms and an Al atom in the center located deeper within the Al(2)O(3) slab. The slight difference in the stability of these two sites is due to a difference in the chemical interactions between the Ga atom and the surface O atoms. Strong interactions between the Highest Occupied Molecular Orbital (HOMO) of the Ga atom and the surface state of the Al(2)O(3) surface are observed. This interaction promotes charge transfer from the Ga to the surface Al atoms, which in turn causes the surface Al atoms to move up from the surface.
RESUMO
The computational efficiency and energy-to-solution of several applications using the GAMESS quantum chemistry suite of codes is evaluated for 32-bit and 64-bit ARM-based computers, and compared to an x86 machine. The x86 system completes all benchmark computations more quickly than either ARM system and is the best choice to minimize time to solution. The ARM64 and ARM32 computational performances are similar to each other for Hartree-Fock and density functional theory energy calculations. However, for memory-intensive second-order perturbation theory energy and gradient computations the lower ARM32 read/write memory bandwidth results in computation times as much as 86% longer than on the ARM64 system. The ARM32 system is more energy efficient than the x86 and ARM64 CPUs for all benchmarked methods, while the ARM64 CPU is more energy efficient than the x86 CPU for some core counts and molecular sizes.
RESUMO
Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.
RESUMO
Use of the resolution of Ewald operator method for computing long-range Coulomb and exchange interactions is presented. We show that the accuracy of this method can be controlled by a single parameter in a manner similar to that used by conventional algorithms that compute two-electron integrals. Significant performance advantages over conventional algorithms are observed, particularly for high quality basis sets and globular systems. The approach is directly applicable to hybrid density functional theory.
RESUMO
For intermediate sized chemical systems the use of an auxiliary basis set (ABS) to fit the charge density provides a useful means of accelerating the performance of various quantum chemical methods. As a consequence much effort has been devoted to the design of various ABSs. This paper explores a fundamentally new approach where the ABS is created dynamically based on the specific orbital basis set (OBS) being used. The new approach includes a parameter that is used to coalesce candidate fitting functions together but which can also be used to provide some coarse grain control over the number of functions in the ABS. The accuracy of the new automatically generated ABS (auto-ABS) is systemically studied for a variety of small systems containing the elements H-Kr. Errors in the Coulomb energy computed using auto-ABS and with a variety of OBSs are shown to be small compared to errors in the Hartree-Fock energy due to incompleteness in the OBS. In contrast to fixed size ABSs, the use of auto-ABS is shown to lead to smaller errors as the size (quality) of the OBS is expanded. The performance of auto-ABS is also compared with the use of the recently proposed universal fitting sets [Weigend, Phys. Chem. Chem. Phys. 8, 1057 (2006)] for 180 compounds containing atoms from H to Kr.
RESUMO
QM/MM methods have been developed as a computationally feasible solution to QM simulation of chemical processes, such as enzyme-catalyzed reactions, within a more approximate MM representation of the condensed-phase environment. However, there has been no independent method for checking the quality of this representation, especially for highly nonisotropic protein environments such as those surrounding enzyme active sites. Hence, the validity of QM/MM methods is largely untested. Here we use the possibility of performing all-QM calculations at the semiempirical PM3 level with a linear-scaling method (MOZYME) to assess the performance of a QM/MM method (PM3/AMBER94 force field). Using two model pathways for the hydride-ion transfer reaction of the enzyme dihydrofolate reductase studied previously (Titmuss et al., Chem Phys Lett 2000, 320, 169-176), we have analyzed the reaction energy contributions (QM, QM/MM, and MM) from the QM/MM results and compared them with analogous-region components calculated via an energy partitioning scheme implemented into MOZYME. This analysis further divided the MOZYME components into Coulomb, resonance and exchange energy terms. For the model in which the MM coordinates are kept fixed during the reaction, we find that the MOZYME and QM/MM total energy profiles agree very well, but that there are significant differences in the energy components. Most significantly there is a large change (approximately 16 kcal/mol) in the MOZYME MM component due to polarization of the MM region surrounding the active site, and which arises mostly from MM atoms close to (<10 A) the active-site QM region, which is not modelled explicitly by our QM/MM method. However, for the model where the MM coordinates are allowed to vary during the reaction, we find large differences in the MOZYME and QM/MM total energy profiles, with a discrepancy of 52 kcal/mol between the relative reaction (product-reactant) energies. This is largely due to a difference in the MM energies of 58 kcal/mol, of which we can attribute approximately 40 kcal/mol to geometry effects in the MM region and the remainder, as before, to MM region polarization. Contrary to the fixed-geometry model, there is no correlation of the MM energy changes with distance from the QM region, nor are they contributed by only a few residues. Overall, the results suggest that merely extending the size of the QM region in the QM/MM calculation is not a universal solution to the MOZYME- and QM/MM-method differences. They also suggest that attaching physical significance to MOZYME Coulomb, resonance and exchange components is problematic. Although we conclude that it would be possible to reparameterize the QM/MM force field to reproduce MOZYME energies, a better way to account for both the effects of the protein environment and known deficiencies in semiempirical methods would be to parameterize the force field based on data from DFT or ab initio QM linear-scaling calculations. Such a force field could be used efficiently in MD simulations to calculate free energies.