RESUMO
Accurate thermochemistry estimation of polycyclic molecules is crucial for kinetic modeling of chemical processes that use renewable and alternative feedstocks. In kinetic model generators, molecular properties are estimated rapidly with group additivity, but this method is known to have limitations for polycyclic structures. This issue has been resolved in our work by combining a geometry-based molecular representation with a deep neural network trained on ab initio data. Each molecule is transformed into a probabilistic vector from its interatomic distances, bond angles, and dihedral angles. The model is tested on a small experimental dataset (200 molecules) from the literature, a new medium-sized set (4000 molecules) with both open-shell and closed-shell species, calculated at the CBS-QB3 level with empirical corrections, and a large G4MP2-level QM9-based dataset (40â¯000 molecules). Heat capacities between 298.15 and 2500 K are calculated in the medium set with an average deviation of about 1.5 J mol-1 K-1 and the standard entropy at 298.15 K is predicted with an average error below 4 J mol-1 K-1. The standard enthalpy of formation at 298.15 K has an average out-of-sample error below 4 kJ mol-1 on a QM9 training set size of around 15â¯000 molecules. By fitting NASA polynomials, the enthalpy of formation at higher temperatures can be calculated with the same accuracy as the standard enthalpy of formation. Uncertainty quantification by means of the ensemble standard deviation is included to indicate when molecules that are on the edge or outside of the application range of the model are evaluated.
RESUMO
The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.
RESUMO
Thermo-catalytic conversion of CO2 into more valuable compounds, such as methane, is an attractive strategy for energy storage in chemical bonds and creating a carbon-based circular economy. However, designing heterogeneous catalysts remains a challenging, time- and resource-consuming task. Herein, we present an interpretable, human-in-the-loop active machine learning framework to efficiently plan catalytic experiments, execute them in an automated set-up, and estimate the effect of experimental variables on the catalytic activity. A dataset with 48 catalytic activity tests was compiled from a design space of Ni-Co/Al2O3 catalysts with over 50 million potential combinations in only eight iterations. This small dataset was found sufficient to predict CO2 conversion, methane selectivity, and methane space-time yield with remarkable accuracy (R 2 > 0.9) for untested catalysts and reaction conditions. New experiments and catalysts were selected with this methodology, leading to experimental conditions that improved the methane space-time yield by nearly 50% in comparison to the previously obtained maximum in the dataset. Interpretation of the model predictions unveiled the effect of each catalyst descriptor and reaction condition on the outcome. Particularly, the strong predicted inverse trend between the calcination temperature and the catalytic activity was validated experimentally, and characterization implied an underlying structure-performance relationship. Finally, it is demonstrated that the deployed active learning model is excellently suited to predict and fit kinetic trends with a minimal amount of data. This data-driven framework is a first step to faster, model-based, and interpretable design of catalysts and holds promise for broader applications across catalytic processes.
RESUMO
Chemical engineers heavily rely on precise knowledge of physicochemical properties to model chemical processes. Despite the growing popularity of deep learning, it is only rarely applied for property prediction due to data scarcity and limited accuracy for compounds in industrially-relevant areas of the chemical space. Herein, we present a geometric deep learning framework for predicting gas- and liquid-phase properties based on novel quantum chemical datasets comprising 124,000 molecules. Our findings reveal that the necessity for quantum-chemical information in deep learning models varies significantly depending on the modeled physicochemical property. Specifically, our top-performing geometric model meets the most stringent criteria for "chemically accurate" thermochemistry predictions. We also show that by carefully selecting the appropriate model featurization and evaluating prediction uncertainties, the reliability of the predictions can be strongly enhanced. These insights represent a crucial step towards establishing deep learning as the standard property prediction workflow in both industry and academia.Scientific contributionWe propose a flexible property prediction tool that can handle two-dimensional and three-dimensional molecular information. A thermochemistry prediction methodology that achieves high-level quantum chemistry accuracy for a broad application range is presented. Trained deep learning models and large novel molecular databases of real-world molecules are provided to offer a directly usable and fast property prediction solution to practitioners.