Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 134
Nature ; 589(7840): 59-64, 2021 01.
Article in English | MEDLINE | ID: mdl-33408379


Structurally disordered materials pose fundamental questions1-4, including how different disordered phases ('polyamorphs') can coexist and transform from one phase to another5-9. Amorphous silicon has been extensively studied; it forms a fourfold-coordinated, covalent network at ambient conditions and much-higher-coordinated, metallic phases under pressure10-12. However, a detailed mechanistic understanding of the structural transitions in disordered silicon has been lacking, owing to the intrinsic limitations of even the most advanced experimental and computational techniques, for example, in terms of the system sizes accessible via simulation. Here we show how atomistic machine learning models trained on accurate quantum mechanical computations can help to describe liquid-amorphous and amorphous-amorphous transitions for a system of 100,000 atoms (ten-nanometre length scale), predicting structure, stability and electronic properties. Our simulations reveal a three-step transformation sequence for amorphous silicon under increasing external pressure. First, polyamorphic low- and high-density amorphous regions are found to coexist, rather than appearing sequentially. Then, we observe a structural collapse into a distinct very-high-density amorphous (VHDA) phase. Finally, our simulations indicate the transient nature of this VHDA phase: it rapidly nucleates crystallites, ultimately leading to the formation of a polycrystalline structure, consistent with experiments13-15 but not seen in earlier simulations11,16-18. A machine learning model for the electronic density of states confirms the onset of metallicity during VHDA formation and the subsequent crystallization. These results shed light on the liquid and amorphous states of silicon, and, in a wider context, they exemplify a machine learning-driven approach to predictive materials modelling.

Nature ; 585(7824): 217-220, 2020 09.
Article in English | MEDLINE | ID: mdl-32908269


Hydrogen, the simplest and most abundant element in the Universe, develops a remarkably complex behaviour upon compression1. Since Wigner predicted the dissociation and metallization of solid hydrogen at megabar pressures almost a century ago2, several efforts have been made to explain the many unusual properties of dense hydrogen, including a rich and poorly understood solid polymorphism1,3-5, an anomalous melting line6 and the possible transition to a superconducting state7. Experiments at such extreme conditions are challenging and often lead to hard-to-interpret and controversial observations, whereas theoretical investigations are constrained by the huge computational cost of sufficiently accurate quantum mechanical calculations. Here we present a theoretical study of the phase diagram of dense hydrogen that uses machine learning to 'learn' potential-energy surfaces and interatomic forces from reference calculations and then predict them at low computational cost, overcoming length- and timescale limitations. We reproduce both the re-entrant melting behaviour and the polymorphism of the solid phase. Simulations using our machine-learning-based potentials provide evidence for a continuous molecular-to-atomic transition in the liquid, with no first-order transition observed above the melting line. This suggests a smooth transition between insulating and metallic layers in giant gas planets, and reconciles existing discrepancies between experiments as a manifestation of supercritical behaviour.

Faraday Discuss ; 2024 Sep 25.
Article in English | MEDLINE | ID: mdl-39319702


The widespread application of machine learning (ML) to the chemical sciences is making it very important to understand how the ML models learn to correlate chemical structures with their properties, and what can be done to improve the training efficiency whilst guaranteeing interpretability and transferability. In this work, we demonstrate the wide utility of prediction rigidities, a family of metrics derived from the loss function, in understanding the robustness of ML model predictions. We show that the prediction rigidities allow the assessment of the model not only at the global level, but also on the local or the component-wise level at which the intermediate (e.g. atomic, body-ordered, or range-separated) predictions are made. We leverage these metrics to understand the learning behavior of different ML models, and to guide efficient dataset construction for model training. We finally implement the formalism for a ML model targeting a coarse-grained system to demonstrate the applicability of the prediction rigidities to an even broader class of atomistic modeling problems.

J Chem Phys ; 161(4)2024 Jul 28.
Article in English | MEDLINE | ID: mdl-39056390


Machine-learning models based on a point-cloud representation of a physical object are ubiquitous in scientific applications and particularly well-suited to the atomic-scale description of molecules and materials. Among the many different approaches that have been pursued, the description of local atomic environments in terms of their discretized neighbor densities has been used widely and very successfully. We propose a novel density-based method, which involves computing "Wigner kernels." These are fully equivariant and body-ordered kernels that can be computed iteratively at a cost that is independent of the basis used to discretize the density and grows only linearly with the maximum body-order considered. Wigner kernels represent the infinite-width limit of feature-space models, whose dimensionality and computational cost instead scale exponentially with the increasing order of correlations. We present several examples of the accuracy of models based on Wigner kernels in chemical applications, for both scalar and tensorial targets, reaching an accuracy that is competitive with state-of-the-art deep-learning architectures. We discuss the broader relevance of these findings to equivariant geometric machine-learning.

J Chem Phys ; 161(6)2024 Aug 14.
Article in English | MEDLINE | ID: mdl-39140447


Atomic-scale simulations have progressed tremendously over the past decade, largely thanks to the availability of machine-learning interatomic potentials. These potentials combine the accuracy of electronic structure calculations with the ability to reach extensive length and time scales. The i-PI package facilitates integrating the latest developments in this field with advanced modeling techniques thanks to a modular software architecture based on inter-process communication through a socket interface. The choice of Python for implementation facilitates rapid prototyping but can add computational overhead. In this new release, we carefully benchmarked and optimized i-PI for several common simulation scenarios, making such overhead negligible when i-PI is used to model systems up to tens of thousands of atoms using widely adopted machine learning interatomic potentials, such as Behler-Parinello, DeePMD, and MACE neural networks. We also present the implementation of several new features, including an efficient algorithm to model bosonic and fermionic exchange, a framework for uncertainty quantification to be used in conjunction with machine-learning potentials, a communication infrastructure that allows for deeper integration with electronic-driven simulations, and an approach to simulate coupled photon-nuclear dynamics in optical or plasmonic cavities.

Chem Rev ; 121(16): 9759-9815, 2021 08 25.
Article in English | MEDLINE | ID: mdl-34310133


The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic-scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the success of machine-learning methods for chemistry and materials science. This review summarizes the current understanding of the nature and characteristics of the most commonly used structural and chemical descriptions of atomistic structures, highlighting the deep underlying connections between different frameworks and the ideas that lead to computationally efficient and universally applicable models. It emphasizes the link between properties, structures, their physical chemistry, and their mathematical description, provides examples of recent applications to a diverse set of chemical and materials science problems, and outlines the open questions and the most promising research directions in the field.

Chem Rev ; 121(16): 10073-10141, 2021 08 25.
Article in English | MEDLINE | ID: mdl-34398616


We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come.

J Chem Phys ; 159(6)2023 Aug 14.
Article in English | MEDLINE | ID: mdl-37551818


Spherical harmonics provide a smooth, orthogonal, and symmetry-adapted basis to expand functions on a sphere, and they are used routinely in physical and theoretical chemistry as well as in different fields of science and technology, from geology and atmospheric sciences to signal processing and computer graphics. More recently, they have become a key component of rotationally equivariant models in geometric machine learning, including applications to atomic-scale modeling of molecules and materials. We present an elegant and efficient algorithm for the evaluation of the real-valued spherical harmonics. Our construction features many of the desirable properties of existing schemes and allows us to compute Cartesian derivatives in a numerically stable and computationally efficient manner. To facilitate usage, we implement this algorithm in sphericart, a fast C++ library that also provides C bindings, a Python API, and a PyTorch implementation that includes a GPU kernel.

Nat Mater ; 20(3): 362-369, 2021 Mar.
Article in English | MEDLINE | ID: mdl-33020610


The synthesis of molecular-sieving zeolitic membranes by the assembly of building blocks, avoiding the hydrothermal treatment, is highly desired to improve reproducibility and scalability. Here we report exfoliation of the sodalite precursor RUB-15 into crystalline 0.8-nm-thick nanosheets, that host hydrogen-sieving six-membered rings (6-MRs) of SiO4 tetrahedra. Thin films, fabricated by the filtration of a suspension of exfoliated nanosheets, possess two transport pathways: 6-MR apertures and intersheet gaps. The latter were found to dominate the gas transport and yielded a molecular cutoff of 3.6 Å with a H2/N2 selectivity above 20. The gaps were successfully removed by the condensation of the terminal silanol groups of RUB-15 to yield H2/CO2 selectivities up to 100. The high selectivity was exclusively from the transport across 6-MR, which was confirmed by a good agreement between the experimentally determined apparent activation energy of H2 and that computed by ab initio calculations. The scalable fabrication and the attractive sieving performance at 250-300 °C make these membranes promising for precombustion carbon capture.

J Chem Phys ; 156(1): 014115, 2022 Jan 07.
Article in English | MEDLINE | ID: mdl-34998321


Symmetry considerations are at the core of the major frameworks used to provide an effective mathematical representation of atomic configurations that is then used in machine-learning models to predict the properties associated with each structure. In most cases, the models rely on a description of atom-centered environments and are suitable to learn atomic properties or global observables that can be decomposed into atomic contributions. Many quantities that are relevant for quantum mechanical calculations, however-most notably the single-particle Hamiltonian matrix when written in an atomic orbital basis-are not associated with a single center, but with two (or more) atoms in the structure. We discuss a family of structural descriptors that generalize the very successful atom-centered density correlation features to the N-center case and show, in particular, how this construction can be applied to efficiently learn the matrix elements of the (effective) single-particle Hamiltonian written in an atom-centered orbital basis. These N-center features are fully equivariant-not only in terms of translations and rotations but also in terms of permutations of the indices associated with the atoms-and are suitable to construct symmetry-adapted machine-learning models of new classes of properties of molecules and materials.

J Chem Phys ; 156(20): 204115, 2022 May 28.
Article in English | MEDLINE | ID: mdl-35649823


Data-driven schemes that associate molecular and crystal structures with their microscopic properties share the need for a concise, effective description of the arrangement of their atomic constituents. Many types of models rely on descriptions of atom-centered environments, which are associated with an atomic property or with an atomic contribution to an extensive macroscopic quantity. Frameworks in this class can be understood in terms of atom-centered density correlations (ACDC), which are used as a basis for a body-ordered, symmetry-adapted expansion of the targets. Several other schemes that gather information on the relationship between neighboring atoms using "message-passing" ideas cannot be directly mapped to correlations centered around a single atom. We generalize the ACDC framework to include multi-centered information, generating representations that provide a complete linear basis to regress symmetric functions of atomic coordinates, and provide a coherent foundation to systematize our understanding of both atom-centered and message-passing and invariant and equivariant machine-learning schemes.

J Chem Phys ; 157(23): 234101, 2022 Dec 21.
Article in English | MEDLINE | ID: mdl-36550032


Machine learning frameworks based on correlations of interatomic positions begin with a discretized description of the density of other atoms in the neighborhood of each atom in the system. Symmetry considerations support the use of spherical harmonics to expand the angular dependence of this density, but there is, as of yet, no clear rationale to choose one radial basis over another. Here, we investigate the basis that results from the solution of the Laplacian eigenvalue problem within a sphere around the atom of interest. We show that this generates a basis of controllable smoothness within the sphere (in the same sense as plane waves provide a basis with controllable smoothness for a problem with periodic boundaries) and that a tensor product of Laplacian eigenstates also provides a smooth basis for expanding any higher-order correlation of the atomic density within the appropriate hypersphere. We consider several unsupervised metrics of the quality of a basis for a given dataset and show that the Laplacian eigenstate basis has a performance that is much better than some widely used basis sets and competitive with data-driven bases that numerically optimize each metric. Finally, we investigate the role of the basis in building models of the potential energy. In these tests, we find that a combination of the Laplacian eigenstate basis and target-oriented heuristics leads to equal or improved regression performance when compared to both heuristic and data-driven bases in the literature. We conclude that the smoothness of the basis functions is a key aspect of successful atomic density representations.

J Chem Phys ; 157(17): 177101, 2022 Nov 07.
Article in English | MEDLINE | ID: mdl-36347686


The "quasi-constant" smooth overlap of atomic position and atom-centered symmetry function fingerprint manifolds recently discovered by Parsaeifard and Goedecker [J. Chem. Phys. 156, 034302 (2022)] are closely related to the degenerate pairs of configurations, which are known shortcomings of all low-body-order atom-density correlation representations of molecular structures. Configurations that are rigorously singular-which we demonstrate can only occur in finite, discrete sets and not as a continuous manifold-determine the complete failure of machine-learning models built on this class of descriptors. The "quasi-constant" manifolds, on the other hand, exhibit low but non-zero sensitivity to atomic displacements. As a consequence, for any such manifold, it is possible to optimize model parameters and the training set to mitigate their impact on learning even though this is often impractical and it is preferable to use descriptors that avoid both exact singularities and the associated numerical instability.

Proc Natl Acad Sci U S A ; 116(4): 1110-1115, 2019 01 22.
Article in English | MEDLINE | ID: mdl-30610171


Thermodynamic properties of liquid water as well as hexagonal (Ih) and cubic (Ic) ice are predicted based on density functional theory at the hybrid-functional level, rigorously taking into account quantum nuclear motion, anharmonic fluctuations, and proton disorder. This is made possible by combining advanced free-energy methods and state-of-the-art machine-learning techniques. The ab initio description leads to structural properties in excellent agreement with experiments and reliable estimates of the melting points of light and heavy water. We observe that nuclear-quantum effects contribute a crucial [Formula: see text] to the stability of ice Ih, making it more stable than ice Ic. Our computational approach is general and transferable, providing a comprehensive framework for quantitative predictions of ab initio thermodynamic properties using machine-learning potentials as an intermediate step.

Proc Natl Acad Sci U S A ; 116(9): 3401-3406, 2019 02 26.
Article in English | MEDLINE | ID: mdl-30733292


The molecular dipole polarizability describes the tendency of a molecule to change its dipole moment in response to an applied electric field. This quantity governs key intra- and intermolecular interactions, such as induction and dispersion; plays a vital role in determining the spectroscopic signatures of molecules; and is an essential ingredient in polarizable force fields. Compared with other ground-state properties, an accurate prediction of the molecular polarizability is considerably more difficult, as this response quantity is quite sensitive to the underlying electronic structure description. In this work, we present highly accurate quantum mechanical calculations of the static dipole polarizability tensors of 7,211 small organic molecules computed using linear response coupled cluster singles and doubles theory (LR-CCSD). Using a symmetry-adapted machine-learning approach, we demonstrate that it is possible to predict the LR-CCSD molecular polarizabilities of these small molecules with an error that is an order of magnitude smaller than that of hybrid density functional theory (DFT) at a negligible computational cost. The resultant model is robust and transferable, yielding molecular polarizabilities for a diverse set of 52 larger molecules (including challenging conjugated systems, carbohydrates, small drugs, amino acids, nucleobases, and hydrocarbon isomers) at an accuracy that exceeds that of hybrid DFT. The atom-centered decomposition implicit in our machine-learning approach offers some insight into the shortcomings of DFT in the prediction of this fundamental quantity of interest.

Proc Natl Acad Sci U S A ; 116(51): 25516-25523, 2019 12 17.
Article in English | MEDLINE | ID: mdl-31792179


The interface between water and folded proteins is very complex. Proteins have "patchy" solvent-accessible areas composed of domains of varying hydrophobicity. The textbook understanding is that these domains contribute additively to interfacial properties (Cassie's equation, CE). An ever-growing number of modeling papers question the validity of CE at molecular length scales, but there is no conclusive experiment to support this and no proposed new theoretical framework. Here, we study the wetting of model compounds with patchy surfaces differing solely in patchiness but not in composition. Were CE to be correct, these materials would have had the same solid-liquid work of adhesion (WSL ) and time-averaged structure of interfacial water. We find considerable differences in WSL , and sum-frequency generation measurements of the interfacial water structure show distinctively different spectral features. Molecular-dynamics simulations of water on patchy surfaces capture the observed behaviors and point toward significant nonadditivity in water density and average orientation. They show that a description of the molecular arrangement on the surface is needed to predict its wetting properties. We propose a predictive model that considers, for every molecule, the contributions of its first-nearest neighbors as a descriptor to determine the wetting properties of the surface. The model is validated by measurements of WSL in multiple solvents, where large differences are observed for solvents whose effective diameter is smaller than ∼6 Å. The experiments and theoretical model proposed here provide a starting point to develop a comprehensive understanding of complex biological interfaces as well as for the engineering of synthetic ones.

J Chem Phys ; 154(16): 160401, 2021 Apr 28.
Article in English | MEDLINE | ID: mdl-33940847


Over recent years, the use of statistical learning techniques applied to chemical problems has gained substantial momentum. This is particularly apparent in the realm of physical chemistry, where the balance between empiricism and physics-based theory has traditionally been rather in favor of the latter. In this guest Editorial for the special topic issue on "Machine Learning Meets Chemical Physics," a brief rationale is provided, followed by an overview of the topics covered. We conclude by making some general remarks.

J Chem Phys ; 155(10): 104106, 2021 Sep 14.
Article in English | MEDLINE | ID: mdl-34525832


The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here, we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can build a unique basis that is optimal in this sense and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when working with representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models.

J Chem Phys ; 154(7): 074102, 2021 Feb 21.
Article in English | MEDLINE | ID: mdl-33607885


Machine-learning models have emerged as a very effective strategy to sidestep time-consuming electronic-structure calculations, enabling accurate simulations of greater size, time scale, and complexity. Given the interpolative nature of these models, the reliability of predictions depends on the position in phase space, and it is crucial to obtain an estimate of the error that derives from the finite number of reference structures included during model training. When using a machine-learning potential to sample a finite-temperature ensemble, the uncertainty on individual configurations translates into an error on thermodynamic averages and leads to a loss of accuracy when the simulation enters a previously unexplored region. Here, we discuss how uncertainty quantification can be used, together with a baseline energy model, or a more robust but less accurate interatomic potential, to obtain more resilient simulations and to support active-learning strategies. Furthermore, we introduce an on-the-fly reweighing scheme that makes it possible to estimate the uncertainty in thermodynamic averages extracted from long trajectories. We present examples covering different types of structural and thermodynamic properties and systems as diverse as water and liquid gallium.