RESUMO
The physicochemical properties of molecular crystals, such as solubility, stability, compactability, melting behaviour and bioavailability, depend on their crystal form1. In silico crystal form selection has recently come much closer to realization because of the development of accurate and affordable free-energy calculations2-4. Here we redefine the state of the art, primarily by improving the accuracy of free-energy calculations, constructing a reliable experimental benchmark for solid-solid free-energy differences, quantifying statistical errors for the computed free energies and placing both hydrate crystal structures of different stoichiometries and anhydrate crystal structures on the same energy landscape, with defined error bars, as a function of temperature and relative humidity. The calculated free energies have standard errors of 1-2 kJ mol-1 for industrially relevant compounds, and the method to place crystal structures with different hydrate stoichiometries on the same energy landscape can be extended to other multi-component systems, including solvates. These contributions reduce the gap between the needs of the experimentalist and the capabilities of modern computational tools, transforming crystal structure prediction into a more reliable and actionable procedure that can be used in combination with experimental evidence to direct crystal form selection and establish control5.
RESUMO
We present computational results of many-body dispersion (MBD) interactions for 40 pairs of molecular and atomic species: hydrocarbons, silanes, corresponding fluorinated derivatives, pairs which have multiple H---H contacts between the molecules, as well as pairs having π-π interactions, and pairs of noble gases. The calculations reveal that the MBD stabilization energy (EDISP,MBD) obeys a global relationship, which is gravitational-like. It is proportional to the product of the masses of the two molecules (M1M2) and inversely proportional to the corresponding distances between the molecular centers-of-mass (RCOM-COM) or the H---H distances of the atoms mediating the interactions of the two molecules (RH-H). This relationship reflects the interactions of instantaneous dipoles, which are formed by the ensemble of bonds/atoms in the interacting molecules. Using the D4-corrected dispersion energy (EDISP,D4), which accounts for three-body interactions, we find that the EDISP,MBD and EDISP,D4 data sets are strongly correlated. Based on valence-bond modeling, the dispersion interactions occur primarily due to the increased contributions of the oscillating-ionic VB structures which maintain favorable electrostatic interactions; the [SubâC+:H-+H:C-âSub] and [SubâC:-+H -H:C+âSub] structures; Sub symbolizes general residues. This augmented contribution is complemented by simultaneously diminished-weights of the destabilizing pair of structures, [SubâC+:H--H:C+âSub] and [Subâ:C- H++H:C-âSub]. The local charges are propagated to the entire ensemble of bonds/atoms by partially charging the Sub residues, thus bringing about the "gravitational-like" dependence of dispersion.
RESUMO
Telluronium salts [Ar2 MeTe]X were synthesized, and their Lewis acidic properties towards a number of Lewis bases were addressed in solution by physical and theoretical means. Structural X-ray diffraction analysis of 21 different salts revealed the electrophilicity of the Te centers in their interactions with anions. Telluroniums' propensity to form Lewis pairs was investigated with OPPh3 . Diffusion-ordered NMR spectroscopy suggested that telluroniums can bind up to three OPPh3 molecules. Isotherm titration calorimetry showed that the related heats of association in 1,2-dichloroethane depend on the electronic properties of the substituents of the aryl moiety and on the nature of the counterion. The enthalpies of first association of OPPh3 span -0.5 to -5â kcal mol-1 . Study of the affinity of telluroniums for OPPh3 by state-of-the-art DFT and ab-initio methods revealed the dominant Coulombic and dispersion interactions as well as an entropic effect favoring association in solution. Intermolecular orbital interactions between [Ar2 MeTe]+ cations and OPPh3 are deemed insufficient on their own to ensure the cohesion of [Ar2 MeTe â Bn ]+ complexes in solution (B=Lewis base). Comparison of Grimme's and Tkatchenko's DFT-D4/MBD-vdW thermodynamics of formation of higher [Ar2 MeTe â Bn ]+ complexes revealed significant molecular size-dependent divergence of the two methodologies, with MBD yielding better agreement with experiment.
RESUMO
The quantum Drude oscillator (QDO) model has been widely used as an efficient surrogate to describe the electric response properties of matter as well as long-range interactions in molecules and materials. Most commonly, QDOs are coupled within the dipole approximation so that the Hamiltonian can be exactly diagonalized, which forms the basis for the many-body dispersion method [Phys. Rev. Lett. 108, 236402 (2012)]. The dipole coupling is efficient and allows us to study non-covalent many-body effects in systems with thousands of atoms. However, there are two limitations: (i) the need to regularize the interaction at short distances with empirical damping functions and (ii) the lack of multipolar effects in the coupling potential. In this work, we convincingly address both limitations of the dipole-coupled QDO model by presenting a numerically exact solution of the Coulomb-coupled QDO model by means of quantum Monte Carlo methods. We calculate the potential-energy surfaces of homogeneous QDO dimers, analyzing their properties as a function of the three tunable parameters: frequency, reduced mass, and charge. We study the coupled-QDO model behavior at short distances and show how to parameterize this model to enable an effective description of chemical bonds, such as the covalent bond in the H2 molecule.
RESUMO
Quantum electrodynamic fields possess fluctuations corresponding to transient particle-antiparticle dipoles, which can be characterized by a nonvanishing polarizability density. Here, we extend a recently proposed quantum scaling law to describe the volumetric and radial polarizability density of a quantum field corresponding to electrons and positrons and derive the Casimir self-interaction energy (SIE) density of the field, E[over ¯]_{SIE}, in terms of the fine-structure constant. The proposed model obeys the cosmological equation of state w=-1 and the magnitude of the calculated E[over ¯]_{SIE} lies in between the two recent measurements of the cosmological constant Λ obtained by the Planck Mission and the Hubble Space Telescope.
RESUMO
We develop a quantum embedding method that enables accurate and efficient treatment of interactions between molecules and an environment, while explicitly including many-body correlations. The molecule is composed of classical nuclei and quantum electrons, whereas the environment is modeled via charged quantum harmonic oscillators. We construct a general Hamiltonian and introduce a variational Ansatz for the correlated ground state of the fully interacting molecule-environment system. This wave function is optimized via the variational Monte Carlo method and the ground state energy is subsequently estimated through the diffusion Monte Carlo method. The proposed scheme allows an explicit many-body treatment of electrostatic, polarization, and dispersion interactions between the molecule and the environment. We study solvation energies and excitation energies of benzene derivatives, obtaining excellent agreement with explicit ab initio calculations and experiments.
RESUMO
Predictive modeling of toxicity is a crucial step in the drug discovery pipeline. It can help filter out molecules with a high probability of failing in the early stages of de novo drug design. Thus, several machine learning (ML) models have been developed to predict the toxicity of molecules by combining classical ML techniques or deep neural networks with well-known molecular representations such as fingerprints or 2D graphs. But the more natural, accurate representation of molecules is expected to be defined in physical 3D space like in ab initio methods. Recent studies successfully used equivariant graph neural networks (EGNNs) for representation learning based on 3D structures to predict quantum-mechanical properties of molecules. Inspired by this, we investigated the performance of EGNNs to construct reliable ML models for toxicity prediction. We used the equivariant transformer (ET) model in TorchMD-NET for this. Eleven toxicity data sets taken from MoleculeNet, TDCommons, and ToxBenchmark have been considered to evaluate the capability of ET for toxicity prediction. Our results show that ET adequately learns 3D representations of molecules that can successfully correlate with toxicity activity, achieving good accuracies on most data sets comparable to state-of-the-art models. We also test a physicochemical property, namely, the total energy of a molecule, to inform the toxicity prediction with a physical prior. However, our work suggests that these two properties can not be related. We also provide an attention weight analysis for helping to understand the toxicity prediction in 3D space and thus increase the explainability of the ML model. In summary, our findings offer promising insights considering 3D geometry information via EGNNs and provide a straightforward way to integrate molecular conformers into ML-based pipelines for predicting and investigating toxicity prediction in physical space. We expect that in the future, especially for larger, more diverse data sets, EGNNs will be an essential tool in this domain.
RESUMO
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
RESUMO
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
RESUMO
Understanding correlations - or lack thereof - between molecular properties is crucial for enabling fast and accurate molecular design strategies. In this contribution, we explore the relation between two key quantities describing the electronic structure and chemical properties of molecular systems: the energy gap between the frontier orbitals and the dipole polarizability. Based on the recently introduced QM7-X dataset, augmented with accurate molecular polarizability calculations as well as analysis of functional group compositions, we show that polarizability and HOMO-LUMO gap are uncorrelated when considering sufficiently extended subsets of the chemical compound space. The relation between these two properties is further analyzed on specific examples of molecules with similar composition as well as homooligomers. Remarkably, the freedom brought by the lack of correlation between molecular polarizability and HOMO-LUMO gap enables the design of novel materials, as we demonstrate on the example of organic photodetector candidates.
RESUMO
An anisotropic interlayer force field that describes the interlayer interactions in homogeneous and heterogeneous interfaces of group-VI transition metal dichalcogenides (MX2, where M = Mo, W, and X = S, Se) is presented. The force field is benchmarked against density functional theory calculations for bilayer systems within the Heyd-Scuseria-Ernzerhof hybrid density functional approximation, augmented by a nonlocal many-body dispersion treatment of long-range correlation. The parametrization yields good agreement with the reference calculations of binding energy curves and sliding potential energy surfaces. It is found to be transferable to transition metal dichalcogenide (TMD) junctions outside of the training set that contain the same atom types. Calculated bulk moduli agree with most previous dispersion-corrected density functional theory predictions, which underestimate the available experimental values. Calculated phonon spectra of the various junctions under consideration demonstrate the importance of appropriately treating the anisotropic nature of the layered interfaces. Considering our previous parametrization for MoS2, the anisotropic interlayer potential enables accurate and efficient large-scale simulations of the dynamical, tribological, and thermal transport properties of a large set of homogeneous and heterogeneous TMD interfaces.
RESUMO
Many-body dispersion (MBD) is a powerful framework to treat van der Waals (vdW) dispersion interactions in density-functional theory and related atomistic modeling methods. Several independent implementations of MBD with varying degree of functionality exist across a number of electronic structure codes, which both limits the current users of those codes and complicates dissemination of new variants of MBD. Here, we develop and document libMBD, a library implementation of MBD that is functionally complete, efficient, easy to integrate with any electronic structure code, and already integrated in FHI-aims, DFTB+, VASP, Q-Chem, CASTEP, and Quantum ESPRESSO. libMBD is written in modern Fortran with bindings to C and Python, uses MPI/ScaLAPACK for parallelization, and implements MBD for both finite and periodic systems, with analytical gradients with respect to all input parameters. The computational cost has asymptotic cubic scaling with system size, and evaluation of gradients only changes the prefactor of the scaling law, with libMBD exhibiting strong scaling up to 256 processor cores. Other MBD properties beyond energy and gradients can be calculated with libMBD, such as the charge-density polarization, first-order Coulomb correction, the dielectric function, or the order-by-order expansion of the energy in the dipole interaction. Calculations on supramolecular complexes with MBD-corrected electronic structure methods and a meta-review of previous applications of MBD demonstrate the broad applicability of the libMBD package to treat vdW interactions.
RESUMO
Understanding complex materials at different length scales requires reliably accounting for van der Waals (vdW) interactions, which stem from long-range electronic correlations. While the important role of many-body vdW interactions has been extensively documented for the stability of materials, much less is known about the coupling between vdW interactions and atomic forces. Here we analyze the Hessian force response matrix for a single and two vdW-coupled atomic chains to show that a many-body description of vdW interactions yields atomic force response magnitudes that exceed the expected pairwise decay by 3-5 orders of magnitude for a wide range of separations between perturbed and observed atoms. Similar findings are confirmed for carbon nanotubes, graphene, and delamination of graphene from a silicon substrate previously studied experimentally. This colossal force enhancement suggests implications for phonon spectra, free energies, interfacial adhesion, and collective dynamics in materials with many interacting atoms.
RESUMO
Polarizability is a key response property of physical and chemical systems, which has an impact on intermolecular interactions, spectroscopic observables, and vacuum polarization. The calculation of polarizability for quantum systems involves an infinite sum over all excited (bound and continuum) states, concealing the physical interpretation of polarization mechanisms and complicating the derivation of efficient response models. Approximate expressions for the dipole polarizability, α, rely on different scaling laws αâR^{3}, R^{4}, or R^{7}, for various definitions of the system radius R. Here, we consider a range of single-particle quantum systems of varying spatial dimensionality and having qualitatively different spectra, demonstrating that their polarizability follows a universal four-dimensional scaling law α=C(4µq^{2}/â^{2})L^{4}, where µ and q are the (effective) particle mass and charge, C is a dimensionless excitation-energy ratio, and the characteristic length L is defined via the L^{2} norm of the position operator. This unified formula is also applicable to many-particle systems, as shown by accurately predicting the dipole polarizability of 36 atoms, 1641 small organic molecules, and Bloch electrons in periodic systems.
RESUMO
In order to improve the accuracy of molecular dynamics simulations, classical forcefields are supplemented with a kernel-based machine learning method trained on quantum-mechanical fragment energies. As an example application, a potential-energy surface is generalized for a small DNA duplex, taking into account explicit solvation and long-range electron exchange-correlation effects. A long-standing problem in molecular science is that experimental studies of the structural and thermodynamic behavior of DNA under tension are not well confirmed by simulation; study of the potential energy vs extension taking into account a novel correction shows that leading classical DNA models have excessive stiffness with respect to stretching. This discrepancy is found to be common across multiple forcefields. The quantum correction is in qualitative agreement with the experimental thermodynamics for larger DNA double helices, providing a candidate explanation for the general and long-standing discrepancy between single molecule stretching experiments and classical calculations of DNA stretching. The new dataset of quantum calculations should facilitate multiple types of nucleic acid simulation, and the associated Kernel Modified Molecular Dynamics method (KMMD) is applicable to biomolecular simulations in general. KMMD is made available as part of the AMBER22 simulation software.
Assuntos
DNA , Simulação de Dinâmica Molecular , Pareamento de Bases , DNA/química , Aprendizado de Máquina , Solventes/químicaRESUMO
Supervised machine learning (ML) and unsupervised ML have been performed on descriptors generated from nonadiabatic (NA) molecular dynamics (MD) trajectories representing non-radiative charge recombination in CsPbI3, a promising solar cell and optoelectronic material. Descriptors generated from every third atom of the iodine sublattice alone are sufficient for a satisfactory prediction of the bandgap and NA coupling for the use in the NA-MD simulation of nonradiative charge recombination, which has a strong influence on material performance. Surprisingly, descriptors based on the cesium sublattice perform better than those of the lead sublattice, even though Cs does not contribute to the relevant wavefunctions, while Pb forms the conduction band and contributes to the valence band. Simplification of the ML models of the NA-MD Hamiltonian achieved by the present analysis helps to overcome the high computational cost of NA-MD through ML and increase the applicability of NA-MD simulations.
RESUMO
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has caused substantially more infections, deaths, and economic disruptions than the 2002-2003 SARS-CoV. The key to understanding SARS-CoV-2's higher infectivity lies partly in its host receptor recognition mechanism. Experiments show that the human angiotensin converting enzyme 2 (ACE2) protein, which serves as the primary receptor for both CoVs, binds to the receptor binding domain (RBD) of CoV-2's spike protein stronger than SARS-CoV's spike RBD. The molecular basis for this difference in binding affinity, however, remains unexplained from X-ray structures. To go beyond insights gained from X-ray structures and investigate the role of thermal fluctuations in structure, we employ all-atom molecular dynamics simulations. Microseconds-long simulations reveal that while CoV and CoV-2 spike-ACE2 interfaces have similar conformational binding modes, CoV-2 spike interacts with ACE2 via a larger combinatorics of polar contacts, and on average, makes 45% more polar contacts. Correlation analysis and thermodynamic calculations indicate that these differences in the density and dynamics of polar contacts arise from differences in spatial arrangements of interfacial residues, and dynamical coupling between interfacial and non-interfacial residues. These results recommend that ongoing efforts to design spike-ACE2 peptide blockers will benefit from incorporating dynamical information as well as allosteric coupling effects.
Assuntos
Enzima de Conversão de Angiotensina 2/química , Enzima de Conversão de Angiotensina 2/metabolismo , Simulação de Dinâmica Molecular , SARS-CoV-2/química , SARS-CoV-2/metabolismo , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/metabolismo , Regulação Alostérica , Humanos , Mutação , Ligação Proteica , Receptores Virais/química , Receptores Virais/metabolismo , TermodinâmicaRESUMO
Lysine methylation can modify noncovalent interactions by altering lysine's hydrophobicity as well as its electronic structure. Although the ramifications of the former are documented, the effects of the latter remain largely unknown. Understanding the electronic structure is important for determining how biological methylation modulates protein-protein binding, and the impact of artificial methylation experiments in which methylated lysines are used as spectroscopic probes and protein crystallization facilitators. The benchmarked first-principles calculations undertaken here reveal that methyl-induced polarization weakens the electrostatic attraction of amines with protein functional groups - salt bridges, hydrogen bonds and cation-π interactions weaken by as much as 10.3, 7.9 and 3.5 kT, respectively. Multipole analysis shows that weakened electrostatics is due to the altered inductive effects, which overcome increased attraction from methyl-enhanced polarizability and dispersion. Due to their fundamental nature, these effects are expected to be present in many cases. A survey of methylated lysines in protein structures reveals several cases in which methyl-induced polarization is the primary driver of altered noncovalent interactions; in these cases, destabilizations are found to be in the 0.6-4.7 kT range. The clearest case of where methyl-induced polarization plays a dominant role in regulating biological function is that of the PHD1-PHD2 domain, which recognizes lysine-methylated states on histones. These results broaden our understanding of how methylation modulates noncovalent interactions.
Assuntos
Lisina , Proteínas , Ligação de Hidrogênio , Lisina/metabolismo , Ligação Proteica , Proteínas/metabolismo , Eletricidade EstáticaRESUMO
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
RESUMO
Dynamics of flexible molecules are often determined by an interplay between local chemical bond fluctuations and conformational changes driven by long-range electrostatics and van der Waals interactions. This interplay between interactions yields complex potential-energy surfaces (PESs) with multiple minima and transition paths between them. In this work, we assess the performance of the state-of-the-art Machine Learning (ML) models, namely, sGDML, SchNet, Gaussian Approximation Potentials/Smooth Overlap of Atomic Positions (GAPs/SOAPs), and Behler-Parrinello neural networks, for reproducing such PESs, while using limited amounts of reference data. As a benchmark, we use the cis to trans thermal relaxation in an azobenzene molecule, where at least three different transition mechanisms should be considered. Although GAP/SOAP, SchNet, and sGDML models can globally achieve a chemical accuracy of 1 kcal mol-1 with fewer than 1000 training points, predictions greatly depend on the ML method used and on the local region of the PES being sampled. Within a given ML method, large differences can be found between predictions of close-to-equilibrium and transition regions, as well as for different transition mechanisms. We identify key challenges that the ML models face mainly due to the intrinsic limitations of commonly used atom-based descriptors. All in all, our results suggest switching from learning the entire PES within a single model to using multiple local models with optimized descriptors, training sets, and architectures for different parts of the complex PES.