RESUMO
Predicting electronic energies, densities, and related chemical properties can facilitate the discovery of novel catalysts, medicines, and battery materials. However, existing machine learning techniques are challenged by the scarcity of training data when exploring unknown chemical spaces. We overcome this barrier by systematically incorporating knowledge of molecular electronic structure into deep learning. By developing a physics-inspired equivariant neural network, we introduce a method to learn molecular representations based on the electronic interactions among atomic orbitals. Our method, OrbNet-Equi, leverages efficient tight-binding simulations and learned mappings to recover high-fidelity physical quantities. OrbNet-Equi accurately models a wide spectrum of target properties while being several orders of magnitude faster than density functional theory. Despite only using training samples collected from readily available small-molecule libraries, OrbNet-Equi outperforms traditional semiempirical and machine learning-based methods on comprehensive downstream benchmarks that encompass diverse main-group chemical processes. Our method also describes interactions in challenging charge-transfer complexes and open-shell systems. We anticipate that the strategy presented here will help to expand opportunities for studies in chemistry and materials science, where the acquisition of experimental or reference training data is costly.
Assuntos
Aprendizado Profundo , Eletrônica , Aprendizado de Máquina , Redes Neurais de Computação , Bibliotecas de Moléculas PequenasRESUMO
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.
RESUMO
We present OrbNet Denali, a machine learning model for an electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing graph neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 × 106 DFT calculations on molecules and geometries. This dataset covers the most common elements in biochemistry and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I) and charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformer benchmark set, OrbNet Denali has a median correlation coefficient of R2 = 0.90 compared to the reference DLPNO-CCSD(T) calculation and R2 = 0.97 compared to the method used to generate the training data (ωB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of ωB97X-D3/def2-TZVP with an average mean absolute error of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.
RESUMO
Community efforts in the computational molecular sciences (CMS) are evolving toward modular, open, and interoperable interfaces that work with existing community codes to provide more functionality and composability than could be achieved with a single program. The Quantum Chemistry Common Driver and Databases (QCDB) project provides such capability through an application programming interface (API) that facilitates interoperability across multiple quantum chemistry software packages. In tandem with the Molecular Sciences Software Institute and their Quantum Chemistry Archive ecosystem, the unique functionalities of several CMS programs are integrated, including CFOUR, GAMESS, NWChem, OpenMM, Psi4, Qcore, TeraChem, and Turbomole, to provide common computational functions, i.e., energy, gradient, and Hessian computations as well as molecular properties such as atomic charges and vibrational frequency analysis. Both standard users and power users benefit from adopting these APIs as they lower the language barrier of input styles and enable a standard layout of variables and data. These designs allow end-to-end interoperable programming of complex computations and provide best practices options by default.
RESUMO
Complex chemical systems present challenges to electronic structure theory stemming from large system sizes, subtle interactions, coupled dynamical time scales, and electronically nonadiabatic effects. New methods are needed to perform reliable, rigorous, and affordable electronic structure calculations for simulating the properties and dynamics of such systems. This Account reviews projection-based quantum embedding for electronic structure, which provides a formally exact method for density functional theory (DFT) embedding. The method also provides a rigorous and accurate approach for describing a small part of a chemical system at the level of a correlated wavefunction (WF) method while the remainder of the system is described at the level of DFT. A key advantage of projection-based embedding is that it can be formulated in terms of an extremely simple level-shift projection operator, which eliminates the need for any optimized effective potential calculation or kinetic energy functional approximation while simultaneously ensuring that no extra programming is needed to perform WF-in-DFT embedding with an arbitrary WF method. The current work presents the theoretical underpinnings of projection-based embedding, describes use of the method for combining wavefunction and density functional theories, and discusses technical refinements that have improved the applicability and robustness of the method. Applications of projection-based WF-in-DFT embedding are also reviewed, with particular focus on recent work on transition-metal catalysis, enzyme reactivity, and battery electrolyte decomposition. In particular, we review the application of projection-based embedding for the prediction of electrochemical potentials and reaction pathways in a Co-centered hydrogen evolution catalyst. Projection-based WF-in-DFT calculations are shown to provide quantitative accuracy while greatly reducing the computational cost compared with a reference coupled cluster calculation on the full system. Additionally, projection-based WF-in-DFT embedding is used to study the mechanism of citrate synthase; it is shown that projection-based WF-in-DFT largely eliminates the sensitivity of the potential energy landscape to the employed DFT exchange-correlation functional. Finally, we demonstrate the use of projection-based WF-in-DFT to study electron transfer reactions associated with battery electrolyte decomposition. Projection-based WF-in-DFT embedding is used to calculate the oxidation potentials of neat ethylene carbonate (EC), neat dimethyl carbonate (DMC), and 1:1 mixtures of EC and DMC in order to overcome qualitative inaccuracies in the electron densities and ionization energies obtained from conventional DFT methods. By further embedding the WF-in-DFT description in a molecular mechanics point-charge environment, this work enables an explicit description of the solvent and ensemble averaging of the solvent configurations. Looking forward, we anticipate continued refinement of the projection-based embedding methodology as well as its increasingly widespread application in diverse areas of chemistry, biology, and materials science.
RESUMO
The design of effective electrocatalysts for carbon dioxide reduction requires understanding the mechanistic underpinnings governing the binding, reduction, and protonation of CO2. A critical aspect to understanding and tuning these factors for optimal catalysis revolves around controlling the electronic environments of the primary and secondary coordination sphere. Herein we report a series of para-substituted cobalt aminopyridine macrocyclic catalysts 2-4 capable of carrying out the electrochemical reduction of CO2 to CO. Under catalytic conditions, complexes 2-4, as well as the unsubstituted cobalt aminopyridine complex 1, exhibit icat/ip values ranging from 144 to 781. Complexes 2 and 4 exhibit a pronounced precatalytic wave suggestive of an ECEC mechanism. A Hammett analysis reveals that ligand modifications with electron-donating groups enhance catalysis (ρ < 0), indicative of positive charge buildup in the transition state. This trend also extends to the CoI/0 potential, where complexes possessing more negative E(CoI/0) reductions exhibit greater icat/ip values. The reported modifications offer a synthetic lever to tune catalytic activity, orthogonal to our previous study of the role of pendant hydrogen bond donors.
RESUMO
We introduce a machine learning method in which energy solutions from the Schrödinger equation are predicted using symmetry adapted atomic orbital features and a graph neural-network architecture. OrbNet is shown to outperform existing methods in terms of learning efficiency and transferability for the prediction of density functional theory results while employing low-cost features that are obtained from semi-empirical electronic structure calculations. For applications to datasets of drug-like molecules, including QM7b-T, QM9, GDB-13-T, DrugBank, and the conformer benchmark dataset of Folmsbee and Hutchison [Int. J. Quantum Chem. (published online) (2020)], OrbNet predicts energies within chemical accuracy of density functional theory at a computational cost that is 1000-fold or more reduced.
RESUMO
We address the degree to which machine learning (ML) can be used to accurately and transferably predict post-Hartree-Fock correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the second-order Møller-Plessett perturbation theory, coupled cluster with singles and doubles (CCSD), and CCSD with perturbative triples levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule can be described to within 1 mhartree using a model that is trained from only a single reference calculation at a randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison with the previously reported Δ-ML method, MOB-ML is shown to reach chemical accuracy with threefold fewer training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer training calculations than Δ-ML (140 vs 5000 training calculations).
RESUMO
The idea of using fragment embedding to circumvent the high computational scaling of accurate electronic structure methods while retaining high accuracy has been a long-standing goal for quantum chemists. Traditional fragment embedding methods mainly focus on systems composed of weakly correlated parts and are insufficient when division across chemical bonds is unavoidable. Recently, density matrix embedding theory and other methods based on the Schmidt decomposition have emerged as a fresh approach to this problem. Despite their success on model systems, these methods can prove difficult for realistic systems because they rely on either a rigid, non-overlapping partition of the system or a specification of some special sites (i.e., "edge" and "center" sites), neither of which is well-defined in general for real molecules. In this work, we present a new Schmidt decomposition-based embedding scheme called incremental embedding that allows the combination of arbitrary overlapping fragments without the knowledge of edge sites. This method forms a convergent hierarchy in the sense that higher accuracy can be obtained by using fragments involving more sites. The computational scaling for the first few levels is lower than that of most correlated wave function methods. We present results for several small molecules in atom-centered Gaussian basis sets and demonstrate that incremental embedding converges quickly with fragment size and recovers most static correlation in small basis sets even when truncated at the second lowest level.
RESUMO
Projection-based embedding offers a simple framework for embedding correlated wavefunction methods in density functional theory. Partitioning between the correlated wavefunction and density functional subsystems is performed in the space of localized molecular orbitals. However, during a large geometry change-such as a chemical reaction-the nature of these localized molecular orbitals, as well as their partitioning into the two subsystems, can change dramatically. This can lead to unphysical cusps and even discontinuities in the potential energy surface. In this work, we present an even-handed framework for localized orbital partitioning that ensures consistent subsystems across a set of molecular geometries. We illustrate this problem and the even-handed solution with a simple example of an SN2 reaction. Applications to a nitrogen umbrella flip in a cobalt-based CO2 reduction catalyst and to the binding of CO to Cu clusters are presented. In both cases, we find that even-handed partitioning enables chemically accurate embedding with modestly sized embedded regions for systems in which previous partitioning strategies are problematic.
RESUMO
The mean-field solutions of electronic excited states are much less accessible than ground state (e.g., Hartree-Fock) solutions. Energy-based optimization methods for excited states, like Δ-SCF (self-consistent field), tend to fall into the lowest solution consistent with a given symmetry-a problem known as "variational collapse." In this work, we combine the ideas of direct energy-targeting and variance-based optimization in order to describe excited states at the mean-field level. The resulting method, σ-SCF, has several advantages. First, it allows one to target any desired excited state by specifying a single parameter: a guess of the energy of that state. It can therefore, in principle, find all excited states. Second, it avoids variational collapse by using a variance-based, unconstrained local minimization. As a consequence, all states-ground or excited-are treated on an equal footing. Third, it provides an alternate approach to locate Δ-SCF solutions that are otherwise hardly accessible by the usual non-aufbau configuration initial guess. We present results for this new method for small atoms (He, Be) and molecules (H2, HF). We find that σ-SCF is very effective at locating excited states, including individual, high energy excitations within a dense manifold of excited states. Like all single determinant methods, σ-SCF shows prominent spin-symmetry breaking for open shell states and our results suggest that this method could be further improved with spin projection.
RESUMO
Strong correlation poses a difficult problem for electronic structure theory, with computational cost scaling quickly with system size. Fragment embedding is an attractive approach to this problem. By dividing a large complicated system into smaller manageable fragments "embedded" in an approximate description of the rest of the system, we can hope to ameliorate the steep cost of correlated calculations. While appealing, these methods often converge slowly with fragment size because of small errors at the boundary between fragment and bath. We describe a new electronic embedding method, dubbed "Bootstrap Embedding," a self-consistent wavefunction-in-wavefunction embedding theory that uses overlapping fragments to improve the description of fragment edges. We apply this method to the one dimensional Hubbard model and a translationally asymmetric variant, and find that it performs very well for energies and populations. We find Bootstrap Embedding converges rapidly with embedded fragment size, overcoming the surface-area-to-volume-ratio error typical of many embedding methods. We anticipate that this method may lead to a low-scaling, high accuracy treatment of electron correlation in large molecular systems.
RESUMO
Water is an extremely important liquid for chemistry and the search for more accurate force fields for liquid water continues unabated. Neglect of diatomic differential overlap (NDDO) molecular orbital methods provide and intriguing generalization of classical force fields in this regard because they can account both for bond breaking and electronic polarization of molecules. However, we show that most standard NDDO methods fail for water because they give an incorrect description of hydrogen bonding, water's key structural feature. Using force matching, we design a reparameterized NDDO model and find that it qualitatively reproduces the experimental radial distribution function of water, as well as various monomer, dimer, and bulk properties that PM6 does not. This suggests that the apparent limitations of NDDO models are primarily due to poor parameterization and not to the NDDO approximations themselves. Finally, we identify the physical parameters that most influence the condensed phase properties. These results help to elucidate the chemistry that a semiempirical molecular orbital picture of water must capture. We conclude that properly parameterized NDDO models could be useful for simulations that require electronically detailed explicit solvent, including the calculation of redox potentials and simulation of charge transfer and photochemistry.
RESUMO
Triplet excitons are ubiquitous in organic optoelectronics, but they are often an undesirable energy sink because they are spin-forbidden from emitting light and their high binding energy hinders the generation of free electron-hole pairs. Harvesting their energy is consequently an important technological challenge. Here, we demonstrate direct excitonic energy transfer from 'dark' triplets in the organic semiconductor tetracene to colloidal PbS nanocrystals, thereby successfully harnessing molecular triplet excitons in the near infrared. Steady-state excitation spectra, supported by transient photoluminescence studies, demonstrate that the transfer efficiency is at least (90 ± 13)%. The mechanism is a Dexter hopping process consisting of the simultaneous exchange of two electrons. Triplet exciton transfer to nanocrystals is expected to be broadly applicable in solar and near-infrared light-emitting applications, where effective molecular phosphors are lacking at present. In particular, this route to 'brighten' low-energy molecular triplet excitons may permit singlet exciton fission sensitization of conventional silicon solar cells.
RESUMO
Density matrix embedding theory (DMET) has emerged as a powerful tool for performing wave function-in-wave function embedding for strongly correlated systems. In traditional DMET, an accurate calculation is performed on a small impurity embedded in a mean field bath. Here, we extend the original DMET equations to account for correlation in the bath via an antisymmetrized geminal power (AGP) wave function. The resulting formalism has a number of advantages. First, it allows one to properly treat the weak correlation limit of independent pairs, which DMET is unable to do with a mean-field bath. Second, it associates a size extensive correlation energy with a given density matrix (for the models tested), which AGP by itself is incapable of providing. Third, it provides a reasonable description of charge redistribution in strongly correlated but non-periodic systems. Thus, AGP-DMET appears to be a good starting point for describing electron correlation in molecules, which are aperiodic and possess both strong and weak electron correlation.
RESUMO
BACKGROUND: Intramuscular fat (IMF) and intramuscular connective tissue (IMC) are often seen in human myopathies and are central to beef quality. The mechanisms regulating their accumulation remain poorly understood. Here, we explored the possibility of using beef cattle as a novel model for mechanistic studies of intramuscular adipogenesis and fibrogenesis. METHODS: Skeletal muscle single-cell RNAseq was performed on three cattle breeds, including Wagyu (high IMF), Brahman (abundant IMC but scarce IMF), and Wagyu/Brahman cross. Sophisticated bioinformatics analyses, including clustering analysis, gene set enrichment analyses, gene regulatory network construction, RNA velocity, pseudotime analysis, and cell-cell communication analysis, were performed to elucidate heterogeneities and differentiation processes of individual cell types and differences between cattle breeds. Experiments were conducted to validate the function and specificity of identified key regulatory and marker genes. Integrated analysis with multiple published human and non-human primate datasets was performed to identify common mechanisms. RESULTS: A total of 32 708 cells and 21 clusters were identified, including fibro/adipogenic progenitor (FAP) and other resident and infiltrating cell types. We identified an endomysial adipogenic FAP subpopulation enriched for COL4A1 and CFD (log2FC = 3.19 and 1.92, respectively; P < 0.0001) and a perimysial fibrogenic FAP subpopulation enriched for COL1A1 and POSTN (log2FC = 1.83 and 0.87, respectively; P < 0.0001), both of which were likely derived from an unspecified subpopulation. Further analysis revealed more progressed adipogenic programming of Wagyu FAPs and more advanced fibrogenic programming of Brahman FAPs. Mechanistically, NAB2 drives CFD expression, which in turn promotes adipogenesis. CFD expression in FAPs of young cattle before the onset of intramuscular adipogenesis was predictive of IMF contents in adulthood (R2 = 0.885, P < 0.01). Similar adipogenic and fibrogenic FAPs were identified in humans and monkeys. In aged humans with metabolic syndrome and progressed Duchenne muscular dystrophy (DMD) patients, increased CFD expression was observed (P < 0.05 and P < 0.0001, respectively), which was positively correlated with adipogenic marker expression, including ADIPOQ (R2 = 0.303, P < 0.01; and R2 = 0.348, P < 0.01, respectively). The specificity of Postn/POSTN as a fibrogenic FAP marker was validated using a lineage-tracing mouse line. POSTN expression was elevated in Brahman FAPs (P < 0.0001) and DMD patients (P < 0.01) but not in aged humans. Strong interactions between vascular cells and FAPs were also identified. CONCLUSIONS: Our study demonstrates the feasibility of beef cattle as a model for studying IMF and IMC. We illustrate the FAP programming during intramuscular adipogenesis and fibrogenesis and reveal the reliability of CFD as a predictor and biomarker of IMF accumulation in cattle and humans.
Assuntos
Adipogenia , Distrofia Muscular de Duchenne , Bovinos , Humanos , Animais , Camundongos , Idoso , Adipogenia/fisiologia , Reprodutibilidade dos Testes , Músculo Esquelético/metabolismo , Diferenciação CelularRESUMO
Theoretical studies of localization, anomalous diffusion and ergodicity breaking require solving the electronic structure of disordered systems. We use free probability to approximate the ensemble-averaged density of states without exact diagonalization. We present an error analysis that quantifies the accuracy using a generalized moment expansion, allowing us to distinguish between different approximations. We identify an approximation that is accurate to the eighth moment across all noise strengths, and contrast this with perturbation theory and isotropic entanglement theory.
RESUMO
Kinetic Monte Carlo is a method used to model the state-to-state kinetics of atomic systems when all reaction mechanisms and rates are known a priori. Adaptive versions of this algorithm use saddle searches from each visited state so that unexpected and complex reaction mechanisms can also be included. Here, we describe how calculated reaction mechanisms can be stored concisely in a kinetic database and subsequently reused to reduce the computational cost of such simulations. As all accessible reaction mechanisms available in a system are contained in the database, the cost of the adaptive algorithm is reduced towards that of standard kinetic Monte Carlo.
RESUMO
The structure of 1.7 nm Pt nanoparticles is investigated using x-ray diffraction (XRD) measurements and density functional theory (DFT) calculations. Two types of particles are compared, those made by solution chemistry which are capped either by thiol or amine ligands, and dendrimer encapsulated particles (DENs) which do not have capping ligands. All particles were dried before analyzing their structure. Pair distribution function (PDF) data from XRD measurements show that the ligand-capped particles are more disordered than the DENs. To determine the structure of the particles and the nature of the ligand-induced disorder, we use a hybrid reverse Monte Carlo approach. A weighted average of the calculated binding energy of the particles and a goodness-of-fit parameter to the PDF data is taken as the object function, which is minimized to determine the optimal structure. A scan over different weights gives the set of pareto optimal structures, which show how well simultaneous agreement can be reached to both experiment and theory. Using an embedded atom potential to sample configuration space and DFT to refine the optimal structures, we show that the DEN structure is most consistent with a face centered cubic lattice of truncated octahedral shape. The disorder induced by the capping ligands is consistent with surface relaxation of the particle rather than disorder of the crystal structure.
RESUMO
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus ob-scure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized. ACM REFERENCE FORMAT: Abigail Dommer 1 , Lorenzo Casalino 1 , Fiona Kearns 1 , Mia Rosenfeld 1 , Nicholas Wauer 1 , Surl-Hee Ahn 1 , John Russo, 2 Sofia Oliveira 3 , Clare Morris 1 , AnthonyBogetti 4 , AndaTrifan 5,6 , Alexander Brace 5,7 , TerraSztain 1,8 , Austin Clyde 5,7 , Heng Ma 5 , Chakra Chennubhotla 4 , Hyungro Lee 9 , Matteo Turilli 9 , Syma Khalid 10 , Teresa Tamayo-Mendoza 11 , Matthew Welborn 11 , Anders Christensen 11 , Daniel G. A. Smith 11 , Zhuoran Qiao 12 , Sai Krishna Sirumalla 11 , Michael O'Connor 11 , Frederick Manby 11 , Anima Anandkumar 12,13 , David Hardy 6 , James Phillips 6 , Abraham Stern 13 , Josh Romero 13 , David Clark 13 , Mitchell Dorrell 14 , Tom Maiden 14 , Lei Huang 15 , John McCalpin 15 , Christo- pherWoods 3 , Alan Gray 13 , MattWilliams 3 , Bryan Barker 16 , HarindaRajapaksha 16 , Richard Pitts 16 , Tom Gibbs 13 , John Stone 6 , Daniel Zuckerman 2 *, Adrian Mulholland 3 *, Thomas MillerIII 11,12 *, ShantenuJha 9 *, Arvind Ramanathan 5 *, Lillian Chong 4 *, Rommie Amaro 1 *. 2021. #COVIDisAirborne: AI-Enabled Multiscale Computational Microscopy ofDeltaSARS-CoV-2 in a Respiratory Aerosol. In Supercomputing '21: International Conference for High Perfor-mance Computing, Networking, Storage, and Analysis . ACM, New York, NY, USA, 14 pages. https://doi.org/finalDOI.