RESUMO
Many-Body eXpansion (MBX) is a C++ library that implements many-body potential energy functions (PEFs) within the "many-body energy" (MB-nrg) formalism. MB-nrg PEFs integrate an underlying polarizable model with explicit machine-learned representations of many-body interactions to achieve chemical accuracy from the gas to the condensed phases. MBX can be employed either as a stand-alone package or as an energy/force engine that can be integrated with generic software for molecular dynamics and Monte Carlo simulations. MBX is parallelized internally using Open Multi-Processing and can utilize Message Passing Interface when available in interfaced molecular simulation software. MBX enables classical and quantum molecular simulations with MB-nrg PEFs, as well as hybrid simulations that combine conventional force fields and MB-nrg PEFs, for diverse systems ranging from small gas-phase clusters to aqueous solutions and molecular fluids to biomolecular systems and metal-organic frameworks.
RESUMO
We report the implementation of a symmetry-adapted perturbation theory algorithm based on a density functional theory [SAPT(DFT)] description of monomers. The implementation adopts a density-fitting treatment of hybrid exchange-correlation kernels to enable the description of monomers with hybrid functionals, as in the algorithm by Bukowski, Podeszwa, and Szalewicz [Chem. Phys. Lett. 414, 111 (2005)]. We have improved the algorithm by increasing numerical stability with QR factorization and optimized the computation of the exchange-correlation kernel with its 2-index density-fitted representation. The algorithm scales as O(N5) formally and is usable for systems with up to â¼3000 basis functions, as demonstrated for the C60-buckycatcher complex with the aug-cc-pVDZ basis set. The hybrid-kernel-based SAPT(DFT) algorithm is shown to be as accurate as SAPT(DFT) implementations based on local effective exact exchange potentials obtained from the local Hartree-Fock (LHF) method while avoiding the lower-scaling [O(N4)] but iterative and sometimes hard-to-converge LHF process. The hybrid-kernel algorithm outperforms Hartree-Fock-based SAPT (SAPT0) for the S66 test set, and its accuracy is comparable to the many-body perturbation theory based SAPT2+ approach, which scales as O(N7), although SAPT2+ exhibits a more narrow distribution of errors.
RESUMO
Symmetry-adapted perturbation theory (SAPT) has become an invaluable tool for studying the fundamental nature of non-covalent interactions by directly computing the electrostatics, exchange (steric) repulsion, induction (polarization), and London dispersion contributions to the interaction energy using quantum mechanics. Further application of SAPT is primarily limited by its computational expense, where even its most affordable variant (SAPT0) scales as the fifth power of system size [O(N5)] due to the dispersion terms. The algorithmic scaling of SAPT0 is reduced from O(N5)âO(N4) by replacing these terms with the empirical D3 dispersion correction of Grimme and co-workers, forming a method that may be termed SAPT0-D3. Here, we optimize the damping parameters for the -D3 terms in SAPT0-D3 using a much larger training set than has previously been considered, namely, 8299 interaction energies computed at the complete-basis-set limit of coupled cluster through perturbative triples [CCSD(T)/CBS]. Perhaps surprisingly, with only three fitted parameters, SAPT0-D3 improves on the accuracy of SAPT0, reducing mean absolute errors from 0.61 to 0.49 kcal mol-1 over the full set of complexes. Additionally, SAPT0-D3 exhibits a nearly 2.5× speedup over conventional SAPT0 for systems with â¼300 atoms and is applied here to systems with up to 459 atoms. Finally, we have also implemented a functional group partitioning of the approach (F-SAPT0-D3) and applied it to determine important contacts in the binding of salbutamol to G-protein coupled ß1-adrenergic receptor in both active and inactive forms. SAPT0-D3 capabilities have been added to the open-source Psi4 software.
Assuntos
Teoria Quântica , Algoritmos , Eletricidade EstáticaRESUMO
We present OrbNet Denali, a machine learning model for an electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing graph neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 × 106 DFT calculations on molecules and geometries. This dataset covers the most common elements in biochemistry and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I) and charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformer benchmark set, OrbNet Denali has a median correlation coefficient of R2 = 0.90 compared to the reference DLPNO-CCSD(T) calculation and R2 = 0.97 compared to the method used to generate the training data (ωB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of ωB97X-D3/def2-TZVP with an average mean absolute error of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.
RESUMO
Community efforts in the computational molecular sciences (CMS) are evolving toward modular, open, and interoperable interfaces that work with existing community codes to provide more functionality and composability than could be achieved with a single program. The Quantum Chemistry Common Driver and Databases (QCDB) project provides such capability through an application programming interface (API) that facilitates interoperability across multiple quantum chemistry software packages. In tandem with the Molecular Sciences Software Institute and their Quantum Chemistry Archive ecosystem, the unique functionalities of several CMS programs are integrated, including CFOUR, GAMESS, NWChem, OpenMM, Psi4, Qcore, TeraChem, and Turbomole, to provide common computational functions, i.e., energy, gradient, and Hessian computations as well as molecular properties such as atomic charges and vibrational frequency analysis. Both standard users and power users benefit from adopting these APIs as they lower the language barrier of input styles and enable a standard layout of variables and data. These designs allow end-to-end interoperable programming of complex computations and provide best practices options by default.
RESUMO
The parameterization of torsional/dihedral angle potential energy terms is a crucial part of developing molecular mechanics force fields. Quantum mechanical (QM) methods are often used to provide samples of the potential energy surface (PES) for fitting the empirical parameters in these force field terms. To ensure that the sampled molecular configurations are thermodynamically feasible, constrained QM geometry optimizations are typically carried out, which relax the orthogonal degrees of freedom while fixing the target torsion angle(s) on a grid of values. However, the quality of results and computational cost are affected by various factors on a non-trivial PES, such as dependence on the chosen scan direction and the lack of efficient approaches to integrate results started from multiple initial guesses. In this paper, we propose a systematic and versatile workflow called TorsionDrive to generate energy-minimized structures on a grid of torsion constraints by means of a recursive wavefront propagation algorithm, which resolves the deficiencies of conventional scanning approaches and generates higher quality QM data for force field development. The capabilities of our method are presented for multi-dimensional scans and multiple initial guess structures, and an integration with the MolSSI QCArchive distributed computing ecosystem is described. The method is implemented in an open-source software package that is compatible with many QM software packages and energy minimization codes.
RESUMO
The focal-point approach, combining several quantum chemistry computations to estimate a more accurate computation at a lower expense, is effective and commonly used for energies. However, it has not yet been widely adopted for properties such as geometries. Here, we examine several focal-point methods combining Møller-Plesset perturbation theory (MP2 and MP2.5) with coupled-cluster theory through perturbative triples [CCSD(T)] for their effectiveness in geometry optimizations using a new driver for the Psi4 electronic structure program that efficiently automates the computation of composite-energy gradients. The test set consists of 94 closed-shell molecules containing first- and/or second-row elements. The focal-point methods utilized combinations of correlation-consistent basis sets cc-pV(X+d)Z and heavy-aug-cc-pV(X+d)Z (X = D, T, Q, 5, 6). Focal-point geometries were compared to those from conventional CCSD(T) using basis sets up to heavy-aug-cc-pV5Z and to geometries from explicitly correlated CCSD(T)-F12 using the cc-pVXZ-F12 (X = D, T) basis sets. All results were compared to reference geometries reported by Karton et al. [J. Chem. Phys. 145, 104101 (2016)] at the CCSD(T)/heavy-aug-cc-pV6Z level of theory. In general, focal-point methods based on an estimate of the MP2 complete-basis-set limit, with a coupled-cluster correction evaluated in a (heavy-aug-)cc-pVXZ basis, are of superior quality to conventional CCSD(T)/(heavy-aug-)cc-pV(X+1)Z and sometimes approach the errors of CCSD(T)/(heavy-aug-)cc-pV(X+2)Z. However, the focal-point methods are much faster computationally. For the benzene molecule, the gradient of such a focal-point approach requires only 4.5% of the computation time of a conventional CCSD(T)/cc-pVTZ gradient and only 0.4% of the time of a CCSD(T)/cc-pVQZ gradient.
RESUMO
First-principles electronic structure calculations are now accessible to a very large community of users across many disciplines, thanks to many successful software packages, some of which are described in this special issue. The traditional coding paradigm for such packages is monolithic, i.e., regardless of how modular its internal structure may be, the code is built independently from others, essentially from the compiler up, possibly with the exception of linear-algebra and message-passing libraries. This model has endured and been quite successful for decades. The successful evolution of the electronic structure methodology itself, however, has resulted in an increasing complexity and an ever longer list of features expected within all software packages, which implies a growing amount of replication between different packages, not only in the initial coding but, more importantly, every time a code needs to be re-engineered to adapt to the evolution of computer hardware architecture. The Electronic Structure Library (ESL) was initiated by CECAM (the European Centre for Atomic and Molecular Calculations) to catalyze a paradigm shift away from the monolithic model and promote modularization, with the ambition to extract common tasks from electronic structure codes and redesign them as open-source libraries available to everybody. Such libraries include "heavy-duty" ones that have the potential for a high degree of parallelization and adaptation to novel hardware within them, thereby separating the sophisticated computer science aspects of performance optimization and re-engineering from the computational science done by, e.g., physicists and chemists when implementing new ideas. We envisage that this modular paradigm will improve overall coding efficiency and enable specialists (whether they be computer scientists or computational scientists) to use their skills more effectively and will lead to a more dynamic evolution of software in the community as well as lower barriers to entry for new developers. The model comes with new challenges, though. The building and compilation of a code based on many interdependent libraries (and their versions) is a much more complex task than that of a code delivered in a single self-contained package. Here, we describe the state of the ESL, the different libraries it now contains, the short- and mid-term plans for further libraries, and the way the new challenges are faced. The ESL is a community initiative into which several pre-existing codes and their developers have contributed with their software and efforts, from which several codes are already benefiting, and which remains open to the community.
RESUMO
PSI4 is a free and open-source ab initio electronic structure program providing implementations of Hartree-Fock, density functional theory, many-body perturbation theory, configuration interaction, density cumulant theory, symmetry-adapted perturbation theory, and coupled-cluster theory. Most of the methods are quite efficient, thanks to density fitting and multi-core parallelism. The program is a hybrid of C++ and Python, and calculations may be run with very simple text files or using the Python API, facilitating post-processing and complex workflows; method developers also have access to most of PSI4's core functionalities via Python. Job specification may be passed using The Molecular Sciences Software Institute (MolSSI) QCSCHEMA data format, facilitating interoperability. A rewrite of our top-level computation driver, and concomitant adoption of the MolSSI QCARCHIVE INFRASTRUCTURE project, makes the latest version of PSI4 well suited to distributed computation of large numbers of independent tasks. The project has fostered the development of independent software components that may be reused in other quantum chemistry programs.
RESUMO
Given the need for modern researchers to produce open, reproducible scientific output, the lack of standards and best practices for sharing data and workflows used to produce and analyze molecular dynamics (MD) simulations has become an important issue in the field. There are now multiple well-established packages to perform molecular dynamics simulations, often highly tuned for exploiting specific classes of hardware, each with strong communities surrounding them, but with very limited interoperability/transferability options. Thus, the choice of the software package often dictates the workflow for both simulation production and analysis. The level of detail in documenting the workflows and analysis code varies greatly in published work, hindering reproducibility of the reported results and the ability for other researchers to build on these studies. An increasing number of researchers are motivated to make their data available, but many challenges remain in order to effectively share and reuse simulation data. To discuss these and other issues related to best practices in the field in general, we organized a workshop in November 2018 ( https://bioexcel.eu/events/workshop-on-sharing-data-from-molecular-simulations/ ). Here, we present a brief overview of this workshop and topics discussed. We hope this effort will spark further conversation in the MD community to pave the way toward more open, interoperable, and reproducible outputs coming from research studies using MD simulations.
Assuntos
Disseminação de Informação , Modelos Químicos , Simulação de Dinâmica Molecular , Reprodutibilidade dos Testes , Software , Fluxo de TrabalhoRESUMO
We develop a stochastic resolution of identity representation to the second-order Matsubara Green's function (sRI-GF2) theory. Using a stochastic resolution of the Coulomb integrals, the second order Born self-energy in GF2 is decoupled and reduced to matrix products/contractions, which reduces the computational cost from O(N5) to O(N3) (with N being the number of atomic orbitals). The current approach can be viewed as an extension to our previous work on stochastic resolution of identity second order Møller-Plesset perturbation theory [T. Y. Takeshita et al., J. Chem. Theory Comput. 13, 4605 (2017)] and offers an alternative to previous stochastic GF2 formulations [D. Neuhauser et al., J. Chem. Theory Comput. 13, 5396 (2017)]. We show that sRI-GF2 recovers the deterministic GF2 results for small systems, is computationally faster than deterministic GF2 for N > 80, and is a practical approach to describe weak correlations in systems with 103 electrons and more.
RESUMO
We present a symmetry-adapted perturbation theory (SAPT) for the interaction of two high-spin open-shell molecules (described by their restricted open-shell Hartree-Fock determinants) resulting in low-spin states of the complex. The previously available SAPT formalisms, except for some system-specific studies for few-electron complexes, were restricted to the high-spin state of the interacting system. Thus, the new approach provides, for the first time, a SAPT-based estimate of the splittings between different spin states of the complex. We have derived and implemented the lowest-order SAPT term responsible for these splittings, that is, the first-order exchange energy. We show that within the so-called S2 approximation commonly used in SAPT (neglecting effects that vanish as fourth or higher powers of intermolecular overlap integrals), the first-order exchange energies for all multiplets are linear combinations of two matrix elements: a diagonal exchange term that determines the spin-averaged effect and a spin-flip term responsible for the splittings between the states. The numerical factors in this linear combination are determined solely by the Clebsch-Gordan coefficients: accordingly, the S2 approximation implies a Heisenberg Hamiltonian picture with a single coupling strength parameter determining all the splittings. The new approach is cast into both molecular-orbital and atomic-orbital expressions: the latter enable an efficient density-fitted implementation. We test the newly developed formalism on several open-shell complexes ranging from diatomic systems (Liâ¯H, Mnâ¯Mn, ) to the phenalenyl dimer.
RESUMO
The field of computational molecular sciences (CMSs) has made innumerable contributions to the understanding of the molecular phenomena that underlie and control chemical processes, which is manifested in a large number of community software projects and codes. The CMS community is now poised to take the next transformative steps of better training in modern software design and engineering methods and tools, increasing interoperability through more systematic adoption of agreed upon standards and accepted best-practices, overcoming unnecessary redundancy in software effort along with greater reproducibility, and increasing the deployment of new software onto hardware platforms from in-house clusters to mid-range computing systems through to modern supercomputers. This in turn will have future impact on the software that will be created to address grand challenge science that we illustrate here: the formulation of diverse catalysts, descriptions of long-range charge and excitation transfer, and development of structural ensembles for intrinsically disordered proteins.
RESUMO
Accurate potential energy models are necessary for reliable atomistic simulations of chemical phenomena. In the realm of biomolecular modeling, large systems like proteins comprise very many noncovalent interactions (NCIs) that can contribute to the protein's stability and structure. This work presents two high-quality chemical databases of common fragment interactions in biomolecular systems as extracted from high-resolution Protein DataBank crystal structures: 3380 sidechain-sidechain interactions and 100 backbone-backbone interactions that inaugurate the BioFragment Database (BFDb). Absolute interaction energies are generated with a computationally tractable explicitly correlated coupled cluster with perturbative triples [CCSD(T)-F12] "silver standard" (0.05 kcal/mol average error) for NCI that demands only a fraction of the cost of the conventional "gold standard," CCSD(T) at the complete basis set limit. By sampling extensively from biological environments, BFDb spans the natural diversity of protein NCI motifs and orientations. In addition to supplying a thorough assessment for lower scaling force-field (2), semi-empirical (3), density functional (244), and wavefunction (45) methods (comprising >1M interaction energies), BFDb provides interactive tools for running and manipulating the resulting large datasets and offers a valuable resource for potential energy model development and validation.
RESUMO
We assessed the performance of a large variety of modern density functional theory approaches for the adsorption of carbon dioxide on molecular models of pyridinic N-doped graphene. Specifically, we selected eight polyheterocyclic aromatic compounds ranging from pyridine and pyrazine to 1,6-diazacoronene and investigated their complexes with CO2 for a large range of intermolecular distances and including both in-plane and stacked orientations. The benchmark interaction energies were computed at the complete-basis-set limit MP2 level plus a CCSD(T) coupled-cluster correction in a moderate but carefully selected basis set. Using a set of 96 benchmark CCSD(T)-level interaction energies as a reference, we investigated the accuracy of DFT-based approaches as a function of the density functional, the dispersion correction, the basis set, and the counterpoise correction or lack thereof. While virtually all DFT variants exhibit some deterioration of accuracy for distances slightly shorter than the van der Waals minima, we were able to identify several schemes such as B2PLYP-D3 and M05-2X-D3 whose average errors on the entire benchmark data set are in the 5-10% range. The top DFT performers were subsequently used to investigate the energy profile for a carbon dioxide transition through model N-doped graphene pores. All investigated methods confirmed that the largest, N4H4 pore allows for a barrierless CO2 transition to the other side of a graphene sheet.
RESUMO
An accurate 2D ab initio potential energy surface of the He-C3 collisional system is calculated using the supermolecular coupled-cluster method with up to perturbative quadruple excitations, CCSDT(Q). This interaction potential is then incorporated in full close-coupling calculations of rotational excitation/de-excitation cross sections in He + C3 collisions for rotational levels j = 0, 2, ..., 10 and collision energies up to 1000 cm(-1). Corresponding rate coefficients are reported for temperature between 1 and 100 K. Results are found to be in excellent agreement with available theoretical data that were restricted to the temperature range of 5-15 K. Implications of the computed rate coefficients to astrophysical models of C3 and carbon clusters in interstellar and circumstellar environments are discussed.
RESUMO
A new highly accurate interaction potential is constructed for the He-H2 van der Waals complex. This potential is fitted to 1900 ab initio energies computed at the very large-basis coupled-cluster level and augmented by corrections for higher-order excitations (up to full configuration interaction level) and the diagonal Born-Oppenheimer correction. At the vibrationally averaged H-H bond length of 1.448736 bohrs, the well depth of our potential, 15.870 ± 0.065 K, is nearly 1 K larger than the most accurate previous studies have indicated. In addition to constructing our own three-dimensional potential in the van der Waals region, we present a reparameterization of the Boothroyd-Martin-Peterson potential surface [A. I. Boothroyd, P. G. Martin, and M. R. Peterson, J. Chem. Phys. 119, 3187 (2003)] that is suitable for all configurations of the triatomic system. Finally, we use the newly developed potentials to compute the properties of the lone bound states of (4)He-H2 and (3)He-H2 and the interaction second virial coefficient of the hydrogen-helium mixture.
RESUMO
We present a methodology for defining and optimizing a general force field for classical molecular simulations, and we describe its use to derive the Open Force Field 1.0.0 small-molecule force field, codenamed Parsley. Rather than using traditional atom typing, our approach is built on the SMIRKS-native Open Force Field (SMIRNOFF) parameter assignment formalism, which handles increases in the diversity and specificity of the force field definition without needlessly increasing the complexity of the specification. Parameters are optimized with the ForceBalance tool, based on reference quantum chemical data that include torsion potential energy profiles, optimized gas-phase structures, and vibrational frequencies. These quantum reference data are computed and are maintained with QCArchive, an open-source and freely available distributed computing and database software ecosystem. In this initial application of the method, we present essentially a full optimization of all valence parameters and report tests of the resulting force field against compounds and data types outside the training set. These tests show improvements in optimized geometries and conformational energetics and demonstrate that Parsley's accuracy for liquid properties is similar to that of other general force fields, as is accuracy on binding free energies. We find that this initial Parsley force field affords accuracy similar to that of other general force fields when used to calculate relative binding free energies spanning 199 protein-ligand systems. Additionally, the resulting infrastructure allows us to rapidly optimize an entirely new force field with minimal human intervention.
Assuntos
Benchmarking , Petroselinum , Ecossistema , Humanos , Ligantes , Conformação MolecularRESUMO
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus ob-scure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized. ACM REFERENCE FORMAT: Abigail Dommer 1 , Lorenzo Casalino 1 , Fiona Kearns 1 , Mia Rosenfeld 1 , Nicholas Wauer 1 , Surl-Hee Ahn 1 , John Russo, 2 Sofia Oliveira 3 , Clare Morris 1 , AnthonyBogetti 4 , AndaTrifan 5,6 , Alexander Brace 5,7 , TerraSztain 1,8 , Austin Clyde 5,7 , Heng Ma 5 , Chakra Chennubhotla 4 , Hyungro Lee 9 , Matteo Turilli 9 , Syma Khalid 10 , Teresa Tamayo-Mendoza 11 , Matthew Welborn 11 , Anders Christensen 11 , Daniel G. A. Smith 11 , Zhuoran Qiao 12 , Sai Krishna Sirumalla 11 , Michael O'Connor 11 , Frederick Manby 11 , Anima Anandkumar 12,13 , David Hardy 6 , James Phillips 6 , Abraham Stern 13 , Josh Romero 13 , David Clark 13 , Mitchell Dorrell 14 , Tom Maiden 14 , Lei Huang 15 , John McCalpin 15 , Christo- pherWoods 3 , Alan Gray 13 , MattWilliams 3 , Bryan Barker 16 , HarindaRajapaksha 16 , Richard Pitts 16 , Tom Gibbs 13 , John Stone 6 , Daniel Zuckerman 2 *, Adrian Mulholland 3 *, Thomas MillerIII 11,12 *, ShantenuJha 9 *, Arvind Ramanathan 5 *, Lillian Chong 4 *, Rommie Amaro 1 *. 2021. #COVIDisAirborne: AI-Enabled Multiscale Computational Microscopy ofDeltaSARS-CoV-2 in a Respiratory Aerosol. In Supercomputing '21: International Conference for High Perfor-mance Computing, Networking, Storage, and Analysis . ACM, New York, NY, USA, 14 pages. https://doi.org/finalDOI.
RESUMO
We introduce a free and open-source software package (PES-Learn) which largely automates the process of producing high-quality machine learning models of molecular potential energy surfaces (PESs). PES-Learn incorporates a generalized framework for producing grid points across a PES that is compatible with most electronic structure theory software. The newly generated or externally supplied PES data can then be used to train and optimize neural network or Gaussian process models in a completely automated fashion. Robust hyperparameter optimization schemes designed specifically for molecular PES applications are implemented to ensure that the best possible model for the data set is fit with high quality. The performance of PES-Learn toward fitting a few semiglobal PESs from the literature is evaluated. We also demonstrate the use of PES-Learn machine learning models in carrying out high-level vibrational configuration interaction computations on water and formaldehyde.