RESUMEN
Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort-design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.
RESUMEN
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work, we demonstrate the use of Bayesian inference for molecular model selection, using Monte Carlo sampling techniques accelerated with surrogate modeling to evaluate the Bayes factor evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three nested levels of model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only for some chemistries. Through this process, we also get detailed information about the distributions and correlation of parameter values, enabling improved parametrization and parameter analysis. We also show how the choice of parameter priors, which encode previous model knowledge, can have substantial effects on the selection of models, penalizing careless introduction of additional complexity. We detail the computational techniques used in this analysis, providing a roadmap for future applications of molecular model selection via Bayesian inference and surrogate modeling.
Asunto(s)
Teorema de Bayes , Simulación por Computador , Método de MontecarloRESUMEN
Sooting tendencies of a series of nitrogen-containing hydrocarbons (NHCs) have been recently characterized experimentally using the yield sooting index (YSI) methodology. This work aims to identify soot-relevant reaction pathways for three selected C6H15N amines, namely, dipropylamine (DPA), diisopropylamine (DIPA), and 3,3-dimethylbutylamine (DMBA) using ReaxFF molecular dynamics (MD) simulations and quantum mechanical (QM) calculations and to interpret the experimentally observed trends. ReaxFF MD simulations are performed to determine the important intermediate species and radicals involved in the fuel decomposition and soot formation processes. QM calculations are employed to extensively search for chemical reactions involving these species and radicals based on the ReaxFF MD results and also to quantitatively characterize the potential energy surfaces. Specifically, ReaxFF simulations are carried out in the NVT ensemble at 1400, 1600, and 1800 K, where soot has been identified to form in the YSI experiment. These simulations account for the interactions among test fuel molecules and pre-existing radicals and intermediate species generated from rich methane combustion, using a recently proposed simulation framework. ReaxFF simulations predict that the reactivity of the amines decrease in the order DIPA > DPA > DMBA, independent of temperature. Both QM calculations and ReaxFF simulations predict that C2H4, C3H6, and C4H8 are the main nonaromatic soot precursors formed during the decomposition of DPA, DIPA, and DMBA, respectively, and the associated reaction pathways are identified for each amine. Both theoretical methods predict that sooting tendency increases in the order DPA, DIPA, and DMBA, consistent with the experimentally measured trend in YSI. This work demonstrates that sooting tendencies and soot-relevant reaction pathways of fuels with unknown chemical kinetics can be identified efficiently through combined ReaxFF and QM simulations. Overall, predictions from ReaxFF simulations and QM calculations are consistent, in terms of fuel reactivity, major intermediates, and major nonaromatic soot precursors.
RESUMEN
We compute the vapor-liquid critical coordinates of a model of helium in which nuclear quantum effects are absent. We employ highly accurate ab initio pair and three-body potentials and calculate the critical parameters rigorously in two ways. First, we calculate the virial coefficients up to the seventh and find the point where an isotherm satisfies the critical conditions. Second, we use Gibbs Ensemble Monte Carlo (GEMC) to calculate the vapor-liquid equilibrium, and extrapolate the phase envelope to the critical point. Both methods yield results that are consistent within their uncertainties. The critical temperature of "classical helium" is 13.0 K (compared to 5.2 K for real helium), the critical pressure is 0.93 MPa, and the critical density is 28.4 mol·L-1, with expanded uncertainties (corresponding to a 95% confidence interval) on the order of 0.1 K, 0.02 MPa, and 0.5 mol·L-1, respectively. The effect of three-body interactions on the location of the critical point is small (lowering the critical temperature by roughly 0.1 K), suggesting that we are justified in ignoring four-body and higher interactions in our calculations. This work is motivated by the use of corresponding-states models for mixtures containing helium (such as some natural gases) at higher temperatures where quantum effects are expected to be negligible; in these situations, the distortion of the critical properties by quantum effects causes problems for the corresponding-states treatment.
RESUMEN
Molecular simulation results at extreme temperatures and pressures can supplement experimental data when developing fundamental equations of state. Since most force fields are optimized to agree with vapor-liquid equilibria (VLE) properties, however, the reliability of the molecular simulation results depends on the validity/transferability of the force field at higher temperatures and pressures. As demonstrated in this study, although state-of-the-art united-atom Mie λ-6 potentials for normal and branched alkanes provide accurate estimates for VLE, they tend to over-predict pressures for dense supercritical fluids and compressed liquids. The physical explanation for this observation is that the repulsive barrier is too steep for the "optimal" united-atom Mie λ-6 potential parameterized with VLE properties. Bayesian inference confirms that no feasible combination of non-bonded parameters (ϵ, σ, and λ) is capable of simultaneously predicting saturated vapor pressures, saturated liquid densities, and pressures at high temperatures and densities. This conclusion has both practical and theoretical ramifications, as more realistic non-bonded potentials may be required for accurate extrapolation to high pressures of industrial interest.
RESUMEN
Molecular simulation has the ability to predict various physical properties that are difficult to obtain experimentally. For example, we implement molecular simulation to predict the critical constants (i.e., critical temperature, critical density, critical pressure, and critical compressibility factor) for large n-alkanes that thermally decompose experimentally (as large as C48). Historically, molecular simulation has been viewed as a tool that is limited to providing qualitative insight. One key reason for this perceived weakness in molecular simulation is the difficulty to quantify the uncertainty in the results. This is because molecular simulations have many sources of uncertainty that propagate and are difficult to quantify. We investigate one of the most important sources of uncertainty, namely, the intermolecular force field parameters. Specifically, we quantify the uncertainty in the Lennard-Jones (LJ) 12-6 parameters for the CH4, CH3, and CH2 united-atom interaction sites. We then demonstrate how the uncertainties in the parameters lead to uncertainties in the saturated liquid density and critical constant values obtained from Gibbs Ensemble Monte Carlo simulation. Our results suggest that the uncertainties attributed to the LJ 12-6 parameters are small enough that quantitatively useful estimates of the saturated liquid density and the critical constants can be obtained from molecular simulation.
RESUMEN
A rigorous statistical analysis is presented for Gibbs ensemble Monte Carlo simulations. This analysis reduces the uncertainty in the critical point estimate when compared with traditional methods found in the literature. Two different improvements are recommended due to the following results. First, the traditional propagation of error approach for estimating the standard deviations used in regression improperly weighs the terms in the objective function due to the inherent interdependence of the vapor and liquid densities. For this reason, an error model is developed to predict the standard deviations. Second, and most importantly, a rigorous algorithm for nonlinear regression is compared to the traditional approach of linearizing the equations and propagating the error in the slope and the intercept. The traditional regression approach can yield nonphysical confidence intervals for the critical constants. By contrast, the rigorous algorithm restricts the confidence regions to values that are physically sensible. To demonstrate the effect of these conclusions, a case study is performed to enhance the reliability of molecular simulations to resolve the n-alkane family trend for the critical temperature and critical density.
RESUMEN
Methodologies for training machine learning potentials (MLPs) with quantum-mechanical simulation data have recently seen tremendous progress. Experimental data have a very different character than simulated data, and most MLP training procedures cannot be easily adapted to incorporate both types of data into the training process. We investigate a training procedure based on iterative Boltzmann inversion that produces a pair potential correction to an existing MLP using equilibrium radial distribution function data. By applying these corrections to an MLP for pure aluminum based on density functional theory, we observe that the resulting model largely addresses previous overstructuring in the melt phase. Interestingly, the corrected MLP also exhibits improved performance in predicting experimental diffusion constants, which are not included in the training procedure. The presented method does not require autodifferentiating through a molecular dynamics solver and does not make assumptions about the MLP architecture. Our results suggest a practical framework for incorporating experimental data into machine learning models to improve the accuracy of molecular dynamics simulations.
RESUMEN
Atomistic simulation has a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. For this reason, chemistry and materials science would greatly benefit from a general reactive MLIP, that is, an MLIP that is applicable to a broad range of reactive chemistry without the need for refitting. Here we develop a general reactive MLIP (ANI-1xnr) through automated sampling of condensed-phase reactions. ANI-1xnr is then applied to study five distinct systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. In all studies, ANI-1xnr closely matches experiment (when available) and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N and O elements in the condensed phase, enabling high-throughput in silico reactive chemistry experimentation.
RESUMEN
Machine learning (ML) models, if trained to data sets of high-fidelity quantum simulations, produce accurate and efficient interatomic potentials. Active learning (AL) is a powerful tool to iteratively generate diverse data sets. In this approach, the ML model provides an uncertainty estimate along with its prediction for each new atomic configuration. If the uncertainty estimate passes a certain threshold, then the configuration is included in the data set. Here we develop a strategy to more rapidly discover configurations that meaningfully augment the training data set. The approach, uncertainty-driven dynamics for active learning (UDD-AL), modifies the potential energy surface used in molecular dynamics simulations to favor regions of configuration space for which there is large model uncertainty. The performance of UDD-AL is demonstrated for two AL tasks: sampling the conformational space of glycine and sampling the promotion of proton transfer in acetylacetone. The method is shown to efficiently explore the chemically relevant configuration space, which may be inaccessible using regular dynamical sampling at target temperature conditions.
Asunto(s)
Fabaceae , Incertidumbre , Glicina , Aprendizaje Automático , Simulación de Dinámica MolecularRESUMEN
We present NEXMD version 2.0, the second release of the NEXMD (Nonadiabatic EXcited-state Molecular Dynamics) software package. Across a variety of new features, NEXMD v2.0 incorporates new implementations of two hybrid quantum-classical dynamics methods, namely, Ehrenfest dynamics (EHR) and the Ab-Initio Multiple Cloning sampling technique for Multiconfigurational Ehrenfest quantum dynamics (MCE-AIMC or simply AIMC), which are alternative options to the previously implemented trajectory surface hopping (TSH) method. To illustrate these methodologies, we outline a direct comparison of these three hybrid quantum-classical dynamics methods as implemented in the same NEXMD framework, discussing their weaknesses and strengths, using the modeled photodynamics of a polyphenylene ethylene dendrimer building block as a representative example. We also describe the expanded normal-mode analysis and constraints for both the ground and excited states, newly implemented in the NEXMD v2.0 framework, which allow for a deeper analysis of the main vibrational motions involved in vibronic dynamics. Overall, NEXMD v2.0 expands the range of applications of NEXMD to a larger variety of multichromophore organic molecules and photophysical processes involving quantum coherences and persistent couplings between electronic excited states and nuclear velocity.
RESUMEN
Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.
RESUMEN
Phosphorescence is commonly utilized for applications including light-emitting diodes and photovoltaics. Machine learning (ML) approaches trained on ab initio datasets of singlet-triplet energy gaps may expedite the discovery of phosphorescent compounds with the desired emission energies. However, we show that standard ML approaches for modeling potential energy surfaces inaccurately predict singlet-triplet energy gaps due to the failure to account for spatial localities of spin transitions. To solve this, we introduce localization layers in a neural network model that weight atomic contributions to the energy gap, thereby allowing the model to isolate the most determinative chemical environments. Trained on the singlet-triplet energy gaps of organic molecules, we apply our method to an out-of-sample test set of large phosphorescent compounds and demonstrate the substantial improvement that localization layers have on predicting their phosphorescence energies. Remarkably, the inferred localization weights have a strong relationship with the ab initio spin density of the singlet-triplet transition, and thus infer localities of the molecule that determine the spin transition, despite the fact that no direct electronic information was provided during training. The use of localization layers is expected to improve the modeling of many localized, non-extensive phenomena and could be implemented in any atom-centered neural network model.
RESUMEN
Rosenfeld proposed two different scaling approaches to model the transport properties of fluids, separated by 22 years, one valid in the dilute gas, and another in the liquid phase. In this work, we demonstrate that these two limiting cases can be connected through the use of a novel approach to scaling transport properties and a bridging function. This approach, which is empirical and not derived from theory, is used to generate reference correlations for the transport properties of the Lennard-Jones 12-6 fluid of viscosity, thermal conductivity, and self-diffusion. This approach, with a very simple functional form, allows for the reproduction of the most accurate simulation data to within nearly their statistical uncertainty. The correlations are used to confirm that for the Lennard-Jones fluid the appropriately scaled transport properties are nearly monovariate functions of the excess entropy from low-density gases into the supercooled phase and up to extreme temperatures. This study represents the most comprehensive metastudy of the transport properties of the Lennard-Jones fluid to date.
RESUMEN
In this study, we present an approach for rapid force field parameterization and uncertainty quantification of the non-bonded interaction parameters for classical force fields. The accuracy of most thermophysical properties, and especially vapor-liquid equilibria (VLE), obtained from molecular simulation depends strongly on the non-bonded interactions. Traditionally, non-bonded interactions are parameterized to agree with macroscopic properties by performing large amounts of direct molecular simulation. Due to the computational cost of molecular simulation, surrogate models (i.e., efficient models that approximate direct molecular simulation results) are an essential tool for high-dimensional parameterization and uncertainty quantification of non-bonded interactions. The present study compares two different configuration-sampling-based surrogate models, namely, Multistate Bennett Acceptance Ratio (MBAR) and Pair Correlation Function Rescaling (PCFR). MBAR and PCFR are coupled with the Isothermal Isochoric (ITIC) thermodynamic integration method for estimating vapor-liquid saturation properties. We find that MBAR and PCFR are complementary in their roles. Specifically, PCFR is preferred when exploring distant regions of the parameter space while MBAR is better in the local domain.