Búsqueda | Portal de Búsqueda de la BVS España

1.

Porting fragmentation methods to GPUs using an OpenMP API: Offloading the resolution-of-the-identity second-order Møller-Plesset perturbation method.

Pham, Buu Q; Carrington, Laura; Tiwari, Ananta; Leang, Sarom S; Alkan, Melisa; Bertoni, Colleen; Datta, Dipayan; Sattasathuchana, Tosaporn; Xu, Peng; Gordon, Mark S.

J Chem Phys ; 158(16)2023 Apr 28.

Artículo en Inglés | MEDLINE | ID: mdl-37114705

RESUMEN

Using an OpenMP Application Programming Interface, the resolution-of-the-identity second-order Møller-Plesset perturbation (RI-MP2) method has been off-loaded onto graphical processing units (GPUs), both as a standalone method in the GAMESS electronic structure program and as an electron correlation energy component in the effective fragment molecular orbital (EFMO) framework. First, a new scheme has been proposed to maximize data digestion on GPUs that subsequently linearizes data transfer from central processing units (CPUs) to GPUs. Second, the GAMESS Fortran code has been interfaced with GPU numerical libraries (e.g., NVIDIA cuBLAS and cuSOLVER) for efficient matrix operations (e.g., matrix multiplication, matrix decomposition, and matrix inversion). The standalone GPU RI-MP2 code shows an increasing speedup of up to 7.5× using one NVIDIA V100 GPU with one IBM 42-core P9 CPU for calculations on fullerenes of increasing size from 40 to 260 carbon atoms using the 6-31G(d)/cc-pVDZ-RI basis sets. A single Summit node with six V100s can compute the RI-MP2 correlation energy of a cluster of 175 water molecules using the correlation consistent basis sets cc-pVDZ/cc-pVDZ-RI containing 4375 atomic orbitals and 14 700 auxiliary basis functions in â¼0.85 h. In the EFMO framework, the GPU RI-MP2 component shows near linear scaling for a large number of V100s when computing the energy of an 1800-atom mesoporous silica nanoparticle in a bath of 4000 water molecules. The parallel efficiencies of the GPU RI-MP2 component with 2304 and 4608 V100s are 98.0% and 96.1%, respectively.

2.

Novel Computer Architectures and Quantum Chemistry.

Gordon, Mark S; Barca, Giuseppe; Leang, Sarom S; Poole, David; Rendell, Alistair P; Galvez Vallejo, Jorge L; Westheimer, Bryce.

J Phys Chem A ; 124(23): 4557-4582, 2020 Jun 11.

Artículo en Inglés | MEDLINE | ID: mdl-32379450

RESUMEN

Electronic structure theory (especially quantum chemistry) has thrived and has become increasingly relevant to a broad spectrum of scientific endeavors as the sophistication of both computer architectures and software engineering has advanced. This article provides a brief history of advances in both hardware and software, from the early days of IBM mainframes to the current emphasis on accelerators and modern programming practices.

3.

Recent developments in the general atomic and molecular electronic structure system.

Barca, Giuseppe M J; Bertoni, Colleen; Carrington, Laura; Datta, Dipayan; De Silva, Nuwan; Deustua, J Emiliano; Fedorov, Dmitri G; Gour, Jeffrey R; Gunina, Anastasia O; Guidez, Emilie; Harville, Taylor; Irle, Stephan; Ivanic, Joe; Kowalski, Karol; Leang, Sarom S; Li, Hui; Li, Wei; Lutz, Jesse J; Magoulas, Ilias; Mato, Joani; Mironov, Vladimir; Nakata, Hiroya; Pham, Buu Q; Piecuch, Piotr; Poole, David; Pruitt, Spencer R; Rendell, Alistair P; Roskop, Luke B; Ruedenberg, Klaus; Sattasathuchana, Tosaporn; Schmidt, Michael W; Shen, Jun; Slipchenko, Lyudmila; Sosonkina, Masha; Sundriyal, Vaibhav; Tiwari, Ananta; Galvez Vallejo, Jorge L; Westheimer, Bryce; Wloch, Marta; Xu, Peng; Zahariev, Federico; Gordon, Mark S.

J Chem Phys ; 152(15): 154102, 2020 Apr 21.

Artículo en Inglés | MEDLINE | ID: mdl-32321259

RESUMEN

A discussion of many of the recently implemented features of GAMESS (General Atomic and Molecular Electronic Structure System) and LibCChem (the C++ CPU/GPU library associated with GAMESS) is presented. These features include fragmentation methods such as the fragment molecular orbital, effective fragment potential and effective fragment molecular orbital methods, hybrid MPI/OpenMP approaches to Hartree-Fock, and resolution of the identity second order perturbation theory. Many new coupled cluster theory methods have been implemented in GAMESS, as have multiple levels of density functional/tight binding theory. The role of accelerators, especially graphical processing units, is discussed in the context of the new features of LibCChem, as it is the associated problem of power consumption as the power of computers increases dramatically. The process by which a complex program suite such as GAMESS is maintained and developed is considered. Future developments are briefly summarized.

4.

The Effective Fragment Molecular Orbital Method: Achieving High Scalability and Accuracy for Large Systems.

Sattasathuchana, Tosaporn; Xu, Peng; Bertoni, Colleen; Kim, Yu Lim; Leang, Sarom S; Pham, Buu Q; Gordon, Mark S.

J Chem Theory Comput ; 20(6): 2445-2461, 2024 Mar 26.

Artículo en Inglés | MEDLINE | ID: mdl-38450638

RESUMEN

The effective fragment molecular orbital (EFMO) method has been developed to predict the total energy of a very large molecular system accurately (with respect to the underlying quantum mechanical method) and efficiently by taking advantage of the locality of strong chemical interactions and employing a two-level hierarchical parallelism. The accuracy of the EFMO method is partly attributed to the accurate and robust intermolecular interaction prediction between distant fragments, in particular, the many-body polarization and dispersion effects, which require the generation of static and dynamic polarizability tensors by solving the coupled perturbed Hartree-Fock (CPHF) and time-dependent HF (TDHF) equations, respectively. Solving the CPHF and TDHF equations is the main EFMO computational bottleneck due to the inefficient (serial) and I/O-intensive implementation of the CPHF and TDHF solvers. In this work, the efficiency and scalability of the EFMO method are significantly improved with a new CPU memory-based implementation for solving the CPHF and TDHF equations that are parallelized by either message passing interface (MPI) or hybrid MPI/OpenMP. The accuracy of the EFMO method is demonstrated for both covalently bonded systems and noncovalently bound molecular clusters by systematically examining the effects of basis sets and a key distance-related cutoff parameter, Rcut. Rcut determines whether a fragment pair (dimer) is treated by the chosen ab initio method or calculated using the effective fragment potential (EFP) method (separated dimers). Decreasing the value of Rcut increases the number of separated (EFP) dimers, thereby decreasing the computational effort. It is demonstrated that excellent accuracy (<1 kcal/mol error per fragment) can be achieved when using a sufficiently large basis set with diffuse functions coupled with a small Rcut value. With the new parallel implementation, the total EFMO wall time is substantially reduced, especially with a high number of MPI ranks. Given a sufficient workload, nearly ideal strong scaling is achieved for the CPHF and TDHF parts of the calculation. For the first time, EFMO calculations with the inclusion of long-range polarization and dispersion interactions on a hydrated mesoporous silica nanoparticle with explicit water solvent molecules (more than 15k atoms) are achieved on a massively parallel supercomputer using nearly 1000 physical nodes. In addition, EFMO calculations on the carbinolamine formation step of an amine-catalyzed aldol reaction at the nanoscale with explicit solvent effects are presented.

5.

The General Atomic and Molecular Electronic Structure System (GAMESS): Novel Methods on Novel Architectures.

Zahariev, Federico; Xu, Peng; Westheimer, Bryce M; Webb, Simon; Galvez Vallejo, Jorge; Tiwari, Ananta; Sundriyal, Vaibhav; Sosonkina, Masha; Shen, Jun; Schoendorff, George; Schlinsog, Megan; Sattasathuchana, Tosaporn; Ruedenberg, Klaus; Roskop, Luke B; Rendell, Alistair P; Poole, David; Piecuch, Piotr; Pham, Buu Q; Mironov, Vladimir; Mato, Joani; Leonard, Sam; Leang, Sarom S; Ivanic, Joe; Hayes, Jackson; Harville, Taylor; Gururangan, Karthik; Guidez, Emilie; Gerasimov, Igor S; Friedl, Christian; Ferreras, Katherine N; Elliott, George; Datta, Dipayan; Cruz, Daniel Del Angel; Carrington, Laura; Bertoni, Colleen; Barca, Giuseppe M J; Alkan, Melisa; Gordon, Mark S.

J Chem Theory Comput ; 19(20): 7031-7055, 2023 Oct 24.

Artículo en Inglés | MEDLINE | ID: mdl-37793073

RESUMEN

The primary focus of GAMESS over the last 5 years has been the development of new high-performance codes that are able to take effective and efficient advantage of the most advanced computer architectures, both CPU and accelerators. These efforts include employing density fitting and fragmentation methods to reduce the high scaling of well-correlated (e.g., coupled-cluster) methods as well as developing novel codes that can take optimal advantage of graphical processing units and other modern accelerators. Because accurate wave functions can be very complex, an important new functionality in GAMESS is the quasi-atomic orbital analysis, an unbiased approach to the understanding of covalent bonds embedded in the wave function. Best practices for the maintenance and distribution of GAMESS are also discussed.

6.

Benchmarking the performance of time-dependent density functional methods.

Leang, Sarom S; Zahariev, Federico; Gordon, Mark S.

J Chem Phys ; 136(10): 104101, 2012 Mar 14.

Artículo en Inglés | MEDLINE | ID: mdl-22423822

RESUMEN

The performance of 24 density functionals, including 14 meta-generalized gradient approximation (mGGA) functionals, is assessed for the calculation of vertical excitation energies against an experimental benchmark set comprising 14 small- to medium-sized compounds with 101 total excited states. The experimental benchmark set consists of singlet, triplet, valence, and Rydberg excited states. The global-hybrid (GH) version of the Perdew-Burke-Ernzerhoff GGA density functional (PBE0) is found to offer the best overall performance with a mean absolute error (MAE) of 0.28 eV. The GH-mGGA Minnesota 2006 density functional with 54% Hartree-Fock exchange (M06-2X) gives a lower MAE of 0.26 eV, but this functional encounters some convergence problems in the ground state. The local density approximation functional consisting of the Slater exchange and Volk-Wilk-Nusair correlation functional (SVWN) outperformed all non-GH GGAs tested. The best pure density functional performance is obtained with the local version of the Minnesota 2006 mGGA density functional (M06-L) with an MAE of 0.41 eV.

7.

Energy-Efficient Computational Chemistry: Comparison of x86 and ARM Systems.

Keipert, Kristopher; Mitra, Gaurav; Sunriyal, Vaibhav; Leang, Sarom S; Sosonkina, Masha; Rendell, Alistair P; Gordon, Mark S.

J Chem Theory Comput ; 11(11): 5055-61, 2015 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-26574303

RESUMEN

The computational efficiency and energy-to-solution of several applications using the GAMESS quantum chemistry suite of codes is evaluated for 32-bit and 64-bit ARM-based computers, and compared to an x86 machine. The x86 system completes all benchmark computations more quickly than either ARM system and is the best choice to minimize time to solution. The ARM64 and ARM32 computational performances are similar to each other for Hartree-Fock and density functional theory energy calculations. However, for memory-intensive second-order perturbation theory energy and gradient computations the lower ARM32 read/write memory bandwidth results in computation times as much as 86% longer than on the ARM64 system. The ARM32 system is more energy efficient than the x86 and ARM64 CPUs for all benchmarked methods, while the ARM64 CPU is more energy efficient than the x86 CPU for some core counts and molecular sizes.

8.

Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S.

J Chem Theory Comput ; 10(3): 908-12, 2014 Mar 11.

Artículo en Inglés | MEDLINE | ID: mdl-26580169

RESUMEN

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA