RESUMO
Strain energy is a fundamental measure of the steric and configurational properties of organic molecules. The ability to estimate strain energy through quantum chemical simulations requires at minimum the knowledge of an initial set of nuclear coordinates. In general, such knowledge is not categorically known when screening or generating large numbers of molecule candidates in the context of molecular design. We present a machine learning approach to predict hydrocarbon strain energies using Benson group equivalents. A featurization strategy is crafted by concatenating the molecule group equivalent counts with easily computable molecular fingerprints. The data are obtained from electronic structure calculations we performed on a set of 166 previously synthesized strained hydrocarbons. These data are provided and include gas phase enthalpies of formation and associated optimized atomic coordinates. The strain energy prediction accuracy of several statistical learning methods is evaluated, and their respective merits and limitations are discussed.
RESUMO
Predictive models for the performance of explosives and propellants are important for their design, optimization, and safety. Thermochemical codes can predict some of these properties from fundamental quantities such as density and formation energies that can be obtained from first principles. Models that are simpler to evaluate are desirable for efficient, rapid screening of material screening. In addition, interpretable models can provide insight into the physics and chemistry of these materials that could be useful to direct new synthesis. Current state-of-the-art performance models are based on either the parametrization of physics-based expressions or data-driven approaches with minimal interpretability. We use parsimonious neural networks (PNNs) to discover interpretable models for the specific impulse of propellants and detonation velocity and pressure for explosives using data collected from the open literature. A combination of evolutionary optimization with custom neural networks explores and trains models with objective functions that balance accuracy and complexity. For all three properties of interest, we find interpretable models that are Pareto optimal in the accuracy and simplicity space.
RESUMO
For many experimentally measured chemical properties that cannot be directly computed from first-principles, the existing physics-based models do not extrapolate well to out-of-sample molecules, and experimental datasets themselves are too small for traditional machine learning (ML) approaches. To overcome these limitations, we apply a transfer learning approach, whereby we simultaneously train a multi-target regression model on a small number of molecules with experimentally measured values and a large number of molecules with related computed properties. We demonstrate this methodology on predicting the experimentally measured impact sensitivity of energetic crystals, finding that both characteristics of the computed dataset and model architecture are important to prediction accuracy of the small experimental dataset. Our directed-message passing neural network (D-MPNN) ML model using transfer learning outperforms direct-ML and physics-based models on a diverse test set, and the new methods described here are widely applicable to modeling many other structure-property relationships.
Assuntos
Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
We develop a convolutional neural network capable of directly parsing the 3D electronic structure of a molecule described by spatial point data for charge density and electrostatic potential represented as a 4D tensor. This method effectively bypasses the need to construct complex representations, or descriptors, of a molecule. This is beneficial because the accuracy of a machine learned model depends on the input representation. Ideally, input descriptors encode the essential physics and chemistry that influence the target property. Thousands of molecular descriptors have been proposed, and proper selection of features requires considerable domain expertise or exhaustive and careful statistical downselection. In contrast, deep learning networks are capable of learning rich data representations. This provides a compelling motivation to use deep learning networks to learn molecular structure-property relations from "raw" data. The convolutional neural network model is jointly trained on over 20,000 molecules that are potentially energetic materials (explosives) to predict dipole moment, total electronic energy, Chapman-Jouguet (C-J) detonation velocity, C-J pressure, C-J temperature, crystal density, HOMO-LUMO gap, and solid phase heat of formation. This work demonstrates the first use of complete 3D electronic structure for machine learning of molecular properties.
Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Eletrônica , Estrutura MolecularRESUMO
This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high-accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access data sets of organic reactions with explicitly calculated template applicability and pretraining a template-relevance neural network on this augmented applicability data set, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small data set of well-curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating that these strategies can be very useful for small data sets.
Assuntos
Redes Neurais de Computação , Software , Algoritmos , Computadores , Aprendizado de MáquinaRESUMO
We describe the development of a density-dependent transferable coarse-grain model of crystalline hexahydro-1,3,5-trinitro-s-triazine (RDX) that can be used with the energy conserving dissipative particle dynamics method. The model is an extension of a recently reported one-site model of RDX that was developed by using a force-matching method. The density-dependent forces in that original model are provided through an interpolation scheme that poorly conserves energy. The development of the new model presented in this work first involved a multi-objective procedure to improve the structural and thermodynamic properties of the previous model, followed by the inclusion of the density dependency via a conservative form of the force field that conserves energy. The new model accurately predicts the density, structure, pressure-volume isotherm, bulk modulus, and elastic constants of the RDX crystal at ambient pressure and exhibits transferability to a liquid phase at melt conditions.
RESUMO
Clathrate hydrates are solid crystalline structures most commonly formed from solutions that have nucleated to form a mixed solid composed of water and gas. Understanding the mechanism of clathrate hydrate nucleation is essential to grasp the fundamental chemistry of these complex structures and their applications. Molecular dynamics (MD) simulation is an ideal method to study nucleation at the molecular level because the size of the critical nucleus and formation rate occur on the nano scale. Various analysis methods for nucleation have been developed through MD to analyze nucleation. In particular, the mean first-passage time (MFPT) and survival probability (SP) methods have proven to be effective in procuring the nucleation rate and critical nucleus size for monatomic systems. This study assesses the MFPT and SP methods, previously used for monatomic systems, when applied to analyzing clathrate hydrate nucleation. Because clathrate hydrate nucleation is relatively difficult to observe in MD simulations (due to its high free energy barrier), these methods have yet to be applied to clathrate hydrate systems. In this study, we have analyzed the nucleation rate and critical nucleus size of methane hydrate using MFPT and SP methods from data generated by MD simulations at 255 K and 50 MPa. MFPT was modified for clathrate hydrate from the original version by adding the maximum likelihood estimate and growth effect term. The nucleation rates calculated by MFPT and SP methods are within 5%, and the critical nucleus size estimated by the MFPT method was 50% higher, than values obtained through other more rigorous but computationally expensive estimates. These methods can also be extended to the analysis of other clathrate hydrates.
RESUMO
Methane clathrate hydrate nucleation and growth is investigated via analysis of molecular dynamics simulations using a new order parameter. This order parameter (OP), named the Mutually Coordinated Guest (MCG) OP, quantifies the appearance and connectivity of molecular clusters composed of guests separated by water clusters. It is the first two-component OP used for quantifying hydrate nucleation and growth. The algorithm for calculating the MCG OP is described in detail. Its physical motivation and advantages compared to existing methods are discussed.
RESUMO
The current knowledge and description of guest molecules within clathrate hydrates only accounts for occupancy within regular polyhedral water cages. Experimental measurements and simulations, examining the tert-butylamine + H2 + H2O hydrate system, now suggest that H2 can also be incorporated within hydrate crystal structures by occupying interstitial sites, that is, locations other than the interior of regular polyhedral water cages. Specifically, H2 is found within the shared heptagonal faces of the large (4(3)5(9)6(2)7(3)) cage and in cavities formed from the disruption of smaller (4(4)5(4)) water cages. The ability of H2 to occupy these interstitial sites and fluctuate position in the crystal lattice demonstrates the dynamic behavior of H2 in solids and reveals new insight into guest-guest and guest-host interactions in clathrate hydrates, with potential implications in increasing overall energy storage properties.
RESUMO
A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
Assuntos
Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Software , Termodinâmica , Temperatura de TransiçãoRESUMO
Deep learning has shown great potential for generating molecules with desired properties. But the cost and time required to obtain relevant property data have limited study to only a few classes of materials for which extensive data have already been collected. We develop a deep learning method that combines a generative model with a property prediction model to fuse small data of one class of molecules with larger data in another class. Common low-level physicochemical properties are jointly embedded into a latent space that can be used to design molecules in the smaller class. The chemical space around the molecules in the training set is explored through local gradient ascent optimization. Based on nine molecules from the original training set, nine new molecules are found to have improved properties while remaining structurally similar to the training molecules thereby easing requirements for entirely new synthesis routes. Validation is performed using an equilibrium thermochemistry code to verify the molecules and target properties. A specific example targeting the Chapman-Jouguet velocity and small data for nitrogen-rich molecules is shown. Despite the relative lack of nitrogen-rich molecule data, the results demonstrate that fusing and joint embedding with plentiful low nitrogen molecular data can produce higher generative performance than using the scarce data alone.
Assuntos
Desenho de Fármacos , Humanos , NitrogênioRESUMO
Nucleation from solution is a ubiquitous phenomenon with relevance to myriad scientific disciplines, including pharmaceuticals, biomineralization, and disease. One prominent example is the nucleation of clathrate hydrates, multicomponent crystalline inclusion compounds relevant to the energy industry where they block pipelines and also constitute a potential vast energy resource. Despite their importance, the molecular mechanism of incipient hydrate formation remains unknown. Herein, we employ advanced molecular simulation tools (pB histogram, equilibrium path sampling) to provide a statistical-mechanical basis for extracting physical insight into the molecular steps by which clathrates form. Through testing the Mutually Coordinated Guest (MCG) order parameter, we demonstrate that both guest (methane) and host (water) structuring are crucial to accurately describe the nucleation of hydrates and determine a critical nucleus size of MCG-1 = 16 at 255 K and 500 bar. Equipped with a validated (and novel) reaction coordinate, subsequent equilibrium path sampling simulations yield the free energy barrier and nucleation rate. The resulting quantitative nucleation process is described by the MCG clustering mechanism. This constitutes a significant advance in the field of hydrates research, as the fitness of a molecular descriptor has never been statistically verified. More broadly, this work has significance to a wide range of multicomponent nucleation contexts wherein the formation mechanism depends on contributions from both solute and solvent.
Assuntos
Metano/química , Interações Hidrofóbicas e Hidrofílicas , Simulação de Dinâmica Molecular , Pressão , Temperatura , Termodinâmica , Água/químicaRESUMO
To better understand the self-assembly of small molecules and nanoparticles adsorbed at interfaces, we have performed extensive Monte Carlo simulations of a simple lattice model based on the seven hard "tetrominoes", connected shapes that occupy four lattice sites. The equations of state of the pure fluids and all of the binary mixtures are determined over a wide range of density, and a large selection of multicomponent mixtures are also studied at selected conditions. Calculations are performed in the grand canonical ensemble and are analogous to real systems in which molecules or nanoparticles reversibly adsorb to a surface or interface from a bulk reservoir. The model studied is athermal; objects in these simulations avoid overlap but otherwise do not interact. As a result, all of the behavior observed is entropically driven. The one-component fluids all exhibit marked self-ordering tendencies at higher densities, with quite complex structures formed in some cases. Significant clustering of objects with the same rotational state (orientation) is also observed in some of the pure fluids. In all of the binary mixtures, the two species are fully miscible at large scales, but exhibit strong species-specific clustering (segregation) at small scales. This behavior persists in multicomponent mixtures; even in seven-component mixtures of all the shapes there is significant association between objects of the same shape. To better understand these phenomena, we calculate the second virial coefficients of the tetrominoes and related quantities, extract thermodynamic volume of mixing data from the simulations of binary mixtures, and determine Henry's law solubilities for each shape in a variety of solvents. The overall picture obtained is one in which complementarity of both the shapes of individual objects and the characteristic structures of different fluids are important in determining the overall behavior of a fluid of a given composition, with sometimes counterintuitive results. Finally, we note that no sharp phase transitions are observed but that this appears to be due to the small size of the objects considered. It is likely that complex phase behavior may be found in systems of larger polyominoes.
RESUMO
The use of evolutionary strategy optimizations in fitting empirical potentials against first-principles data is considered. Empirical potentials can involve a large number of interdependent quantities, the number varying with the complexity of the potential, and the optimization of these presents a challenging numerical problem. Evolutionary strategies are a general class of optimization methods that mimic natural selection by stochastically evolving a population of trial solutions according to rules that select for high values of some fitness function. In this work we apply a variety of evolutionary optimization methods to a representative "parametrization problem" in order to determine which such methods are well-suited to such applications. Prior work on the design of evolutionary strategies has generally focused on finding the extrema of relatively simple mathematical functions, and the findings of such studies may not be transferable to chemical applications of very high dimensionality. The test problem consists of parametrization of the Feuston-Garofalini all-atom potential developed for simulation of silicic acid oligomerization in aqueous solution (Feuston, B. P.; Garofalini, S. H. J. Phys. Chem. 1990, 94, 5351). "Meta-optimization" of the evolutionary method is first considered by fitting this potential against itself, using a wide variety of population sizes, recombination algorithms, mutation-size control methods, and selection methods. Simulated annealing is also considered as an alternative approach. Optimal choices of population size, recombination operator, mutation size control approach, and selection method are discussed, as well as the quantity of data required for the parametrization. It is clear from comparisons of multiple independent optimizations that, even when fitting this potential against itself, there are a considerable number of local extrema in the fitness function. Evolutionary methods are found to be competitive with simulated annealing and are more easily parallelized. Finally, the potential is reparametrized against reference data taken from a Car-Parrinello Molecular Dynamics trajectory of several relevant silicate species in aqueous solution, again using several variant algorithms.
RESUMO
BACKGROUND: Multiple trials have been performed to evaluate second-line clinical chemotherapy in patients with advanced nonsmall cell lung carcinoma (NSCLC). However, no single agent or combination has demonstrated superior activity. METHODS: Patients with advanced NSCLC who had already received one chemotherapeutic regimen were treated with topotecan (0.75 mg/m(2) over 30 minutes, Days 1-5) and gemcitabine (400 mg/m(2) over 30 minutes, Days 1 and 5) every 21 days. RESULTS: Of 35 patients who were treated, 4 (11%) achieved a partial responses and 8 (23%) hadstable disease for at least four courses of treatment. The response rate for patients with refractory disease (progressing during frontline chemotherapy) was 18% (3 of 17) with 18% having stable disease for at least four courses of treatment. The median survival of the entire group was 7 months (range, 1.5-44 months) and 20% (7 of 35) of patients were alive 1 year from the initiation of topotecan and gemcitabine treatment. Patients with refractory disease had a median survival of 4(1/2) months, with 6-month and 1-year survival rates of 47% and 18%, respectively. During Course 1, five patients (14%) developed Grade IV neutropenia and three patients (9%) developed Grade IV thrombocytopenia. Nonhematologic toxicity was relatively mild, with one patient developing Grade III side effects (fatigue) and eight patients (23%) developing Grade II nonhematologic side effects. CONCLUSIONS: The combination of topotecan and gemcitabine demonstrated antitumor activity with a modest side effect profile in patients with advanced, previously treated NSCLC.