Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
J Chem Inf Model ; 64(1): 9-17, 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38147829

RESUMO

Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.


Assuntos
Aprendizado de Máquina , Software , Redes Neurais de Computação , Fenômenos Químicos , Água
2.
Chem Sci ; 14(48): 14229-14242, 2023 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-38098707

RESUMO

Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online.

3.
J Chem Inf Model ; 63(13): 4012-4029, 2023 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-37338239

RESUMO

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.


Assuntos
Aprendizado de Máquina , Incerteza , Reprodutibilidade dos Testes
4.
J Chem Phys ; 158(20)2023 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-37212411

RESUMO

A reliable uncertainty estimator is a key ingredient in the successful use of machine-learning force fields for predictive calculations. Important considerations are correlation with error, overhead during training and inference, and efficient workflows to systematically improve the force field. However, in the case of neural-network force fields, simple committees are often the only option considered due to their easy implementation. Here, we present a generalization of the deep-ensemble design based on multiheaded neural networks and a heteroscedastic loss. It can efficiently deal with uncertainties in both energy and forces and take sources of aleatoric uncertainty affecting the training data into account. We compare uncertainty metrics based on deep ensembles, committees, and bootstrap-aggregation ensembles using data for an ionic liquid and a perovskite surface. We demonstrate an adversarial approach to active learning to efficiently and progressively refine the force fields. That active learning workflow is realistically possible thanks to exceptionally fast training enabled by residual learning and a nonlinear learned optimizer.

5.
J Am Chem Soc ; 144(49): 22599-22610, 2022 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-36459170

RESUMO

The molecular structures synthesizable by organic chemists dictate the molecular functions they can create. The invention and development of chemical reactions are thus critical for chemists to access new and desirable functional molecules in all disciplines of organic chemistry. This work seeks to expedite the exploration of emerging areas of organic chemistry by devising a machine-learning-guided workflow for reaction discovery. Specifically, this study uses machine learning to predict competent electrochemical reactions. To this end, we first develop a molecular representation that enables the production of general models with limited training data. Next, we employ automated experimentation to test a large number of electrochemical reactions. These reactions are categorized as competent or incompetent mixtures, and a classification model was trained to predict reaction competency. This model is used to screen 38,865 potential reactions in silico, and the predictions are used to identify a number of reactions of synthetic or mechanistic interest, 80% of which are found to be competent. Additionally, we provide the predictions for the 38,865-member set in the hope of accelerating the development of this field. We envision that adopting a workflow such as this could enable the rapid development of many fields of chemistry.


Assuntos
Química Orgânica , Aprendizado de Máquina , Estrutura Molecular
6.
Chem Sci ; 13(20): 6039-6053, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35685792

RESUMO

Enzymes synthesize complex natural products effortlessly by catalyzing chemo-, regio-, and enantio-selective transformations. Further, biocatalytic processes are increasingly replacing conventional organic synthesis steps because they use mild solvents, avoid the use of metals, and reduce overall non-biodegradable waste. Here, we present a single-step retrosynthesis search algorithm to facilitate enzymatic synthesis of natural product analogs. First, we develop a tool, RDEnzyme, capable of extracting and applying stereochemically consistent enzymatic reaction templates, i.e., subgraph patterns that describe the changes in connectivity between a product molecule and its corresponding reactant(s). Using RDEnzyme, we demonstrate that molecular similarity is an effective metric to propose retrosynthetic disconnections based on analogy to precedent enzymatic reactions in UniProt/RHEA. Using ∼5500 reactions from RHEA as a knowledge base, the recorded reactants to the product are among the top 10 proposed suggestions in 71% of ∼700 test reactions. Second, we trained a statistical model capable of discriminating between reaction pairs belonging to homologous enzymes and evolutionarily distant enzymes using ∼30 000 reaction pairs from SwissProt as a knowledge base. This model is capable of understanding patterns in enzyme promiscuity to evaluate the likelihood of experimental evolution success. By recursively applying the similarity-based single-step retrosynthesis and evolution prediction workflow, we successfully plan the enzymatic synthesis routes for both active pharmaceutical ingredients (e.g. Islatravir, Molnupiravir) and commodity chemicals (e.g. 1,4-butanediol, branched-chain higher alcohols/biofuels), in a retrospective fashion. Through the development and demonstration of the single-step enzymatic retrosynthesis strategy using natural transformations, our approach provides a first step towards solving the challenging problem of incorporating both enzyme- and organic-chemistry based transformations into a computer aided synthesis planning workflow.

7.
Phys Chem Chem Phys ; 24(26): 15776-15790, 2022 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-35758401

RESUMO

We use polarizable molecular dynamics simulations to study the thermal dependence of both structural and dynamic properties of two ionic liquids sharing the same cation (1-ethyl-3-methylimidazolium). The linear temperature trend in the structure is accompanied by an exponential Arrhenius-like behavior of the dynamics. Our parameter-free Voronoi tessellation analysis directly casts doubt on common concepts such as the alternating shells of cations and anions and the ionicity. The latter tries to explain the physico-chemical properties of the ionic liquids based on the association and dissociation of an ion pair. However, cations are in the majority of both ion cages, around cations and around anions. There is no preference of a cation for a single anion. Collectivity is a key factor in the dynamic properties of ionic liquids. Consequently, collective rotation relaxes faster than single-particle rotations, and the activation energies for collective translation and rotation are lower than those of the single molecules.

8.
J Chem Inf Model ; 62(6): 1388-1398, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35271260

RESUMO

Multiparameter optimization, the heart of drug design, is still an open challenge. Thus, improved methods for automated compound design with multiple controlled properties are desired. Here, we present a significant extension to our previously described fragment-based reinforcement learning method (DeepFMPO) for the generation of novel molecules with optimal properties. As before, the generative process outputs optimized molecules similar to the input structures, now with the improved feature of replacing parts of these molecules with fragments of similar three-dimensional (3D) shape and electrostatics. We developed and benchmarked a new python package, ESP-Sim, for the comparison of the electrostatic potential and the molecular shape, allowing the calculation of high-quality partial charges (e.g., RESP with B3LYP/6-31G**) obtained using the quantum chemistry program Psi4. By performing comparisons of 3D fragments, we can simulate 3D properties while overcoming the notoriously difficult step of accurately describing bioactive conformations. The new improved generative (DeepFMPO v3D) method is demonstrated with a scaffold-hopping exercise identifying CDK2 bioisosteres. The code is open-source and freely available.


Assuntos
Desenho de Fármacos , Eletricidade Estática
9.
J Chem Inf Model ; 62(9): 2101-2110, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-34734699

RESUMO

The estimation of chemical reaction properties such as activation energies, rates, or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust, and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task-dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions, and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields, or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark data sets for future work. All data sets and the developed reaction GCNN model are available online, free of charge, and open source.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Quimioinformática
10.
J Chem Inf Model ; 62(1): 16-26, 2022 01 10.
Artigo em Inglês | MEDLINE | ID: mdl-34939786

RESUMO

Heuristic and machine learning models for rank-ordering reaction templates comprise an important basis for computer-aided organic synthesis regarding both product prediction and retrosynthetic pathway planning. Their viability relies heavily on the quality and characteristics of the underlying template database. With the advent of automated reaction and template extraction software and consequently the creation of template databases too large for manual curation, a data-driven approach to assess and improve the quality of template sets is needed. We therefore systematically studied the influence of template generality, canonicalization, and exclusivity on the performance of different template ranking models. We find that duplicate and nonexclusive templates, i.e., templates which describe the same chemical transformation on identical or overlapping sets of molecules, decrease both the accuracy of the ranking algorithm and the applicability of the respective top-ranked templates significantly. To remedy the negative effects of nonexclusivity, we developed a general and computationally efficient framework to deduplicate and hierarchically correct templates. As a result, performance improved considerably for both heuristic and machine learning template ranking models, as well as multistep retrosynthetic planning models. The canonicalization and correction code is made freely available.


Assuntos
Algoritmos , Software , Computadores , Heurística , Aprendizado de Máquina
11.
J Chem Inf Model ; 61(10): 4949-4961, 2021 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-34587449

RESUMO

Data-driven computer-aided synthesis planning utilizing organic or biocatalyzed reactions from large databases has gained increasing interest in the last decade, sparking the development of numerous tools to extract, apply, and score general reaction templates. The generation of reaction rules for enzymatic reactions is especially challenging since substrate promiscuity varies between enzymes, causing the optimal levels of rule specificity and optimal number of included atoms to differ between enzymes. This complicates an automated extraction from databases and has promoted the creation of manually curated reaction rule sets. Here, we present EHreact, a purely data-driven open-source software tool, to extract and score reaction rules from sets of reactions known to be catalyzed by an enzyme at appropriate levels of specificity without expert knowledge. EHreact extracts and groups reaction rules into tree-like structures, Hasse diagrams, based on common substructures in the imaginary transition structures. Each diagram can be utilized to output a single or a set of reaction rules, as well as calculate the probability of a new substrate to be processed by the given enzyme by inferring information about the reactive site of the enzyme from the known reactions and their grouping in the template tree. EHreact heuristically predicts the activity of a given enzyme on a new substrate, outperforming current approaches in accuracy and functionality.


Assuntos
Computadores , Software , Bases de Dados Factuais , Probabilidade
12.
J Chem Phys ; 155(7): 074504, 2021 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-34418918

RESUMO

Redox-active molecules are of interest in many fields, such as medicine, catalysis, or energy storage. In particular, in supercapacitor applications, they can be grafted to ionic liquids to form so-called biredox ionic liquids. To completely understand the structural and transport properties of such systems, an insight at the molecular scale is often required, but few force fields are developed ad hoc for these molecules. Moreover, they do not include polarization effects, which can lead to inaccurate solvation and dynamical properties. In this work, we developed polarizable force fields for redox-active species anthraquinone (AQ) and 2,2,6,6-tetra-methylpiperidinyl-1-oxyl (TEMPO) in their oxidized and reduced states as well as for acetonitrile. We validate the structural properties of AQ, AQ•-, AQ2-, TEMPO•, and TEMPO+ in acetonitrile against density functional theory-based molecular dynamics simulations and we study the solvation of these redox molecules in acetonitrile. This work is a first step toward the characterization of the role played by AQ and TEMPO in electrochemical and catalytic devices.

13.
Phys Chem Chem Phys ; 23(2): 1616-1626, 2021 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-33410837

RESUMO

The Kamlet-Taft dipolarity/polarizability parameters π* for various ionic liquids were determined using 4-tert-butyl-2-((dicyanomethylene)-5-[4-N,N-diethylamino)-benzylidene]-Δ3-thiazoline and 5-(N,N-dimethylamino)-5'-nitro-2,2'-bithiophene as solvatochromic probes. In contrast to the established π*-probe N,N-diethylnitroaniline, the chromophores presented here show excellent agreement with polarity measurement using the chemical shift of 129Xe. They do not suffer from additional bathochromic UV/vis shifts caused by hydrogen-bonding resulting in too high π*-values for some ionic liquids. In combination with large sets of various ionic liquids, these new chromophores thereby allow for detailed analysis of the physical significance of π* and the comparison to quantum-mechanical methods. We find that π* correlates strongly with the ratio of molar refractivity to molar volume, and thus with the refractive index.

14.
Int J Cancer ; 148(9): 2345-2351, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-33231291

RESUMO

Kaposiform hemangioendothelioma (KHE) is a rare vascular tumor in children, which can be accompanied by life-threatening thrombocytopenia, referred to as Kasabach-Merritt phenomenon (KMP). The mTOR inhibitor sirolimus is emerging as targeted therapy in KHE. As the sirolimus effect on KHE occurs only after several weeks, we aimed to evaluate whether additional transarterial embolization is of benefit for children with KHE and KMP. Seventeen patients with KHE and KMP acquired from 11 hospitals in Germany were retrospectively divided into two cohorts. Children being treated with adjunct transarterial embolization and systemic sirolimus, and those being treated with sirolimus without additional embolization. Bleeding grade as defined by WHO was determined for all patients. Response of the primary tumor at 6 and 12 months assessed by magnetic resonance imaging (MRI), time to response of KMP defined as thrombocyte increase >150 × 103 /µL, as well as rebound rates of both after cessation of sirolimus were compared. N = 8 patients had undergone additive embolization to systemic sirolimus therapy, sirolimus in this group was started after a mean of 6.5 ± 3 days following embolization. N = 9 patients were identified who had received sirolimus without additional embolization. Adjunct embolization induced a more rapid resolution of KMP within a median of 7 days vs 3 months; however, tumor response as well as rebound rates were similar between both groups. Additive embolization may be of value for a more rapid rescue of consumptive coagulopathy in children with KHE and KMP compared to systemic sirolimus only.


Assuntos
Embolização Terapêutica/métodos , Hemangioendotelioma/tratamento farmacológico , Síndrome de Kasabach-Merritt/tratamento farmacológico , Sarcoma de Kaposi/tratamento farmacológico , Sirolimo/uso terapêutico , Feminino , Humanos , Masculino , Estudos Retrospectivos , Sirolimo/farmacologia
15.
Phys Chem Chem Phys ; 22(33): 18388-18399, 2020 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-32797139

RESUMO

Different types of spectroscopy capture different aspects of dynamics and different ranges of intermolecular contributions. In this article, we investigate the dielectric relaxation spectroscopy (DRS) of collective nature and the time-dependent Stokes shift (TDSS) of disputed nature. Our computational study of unconfined and confined water clearly demonstrates that the TDSS reflects local, non-collective dynamics. Surprisingly, we found that the reaction field continuum model (RFCM) used to estimate TDSS curves solely from collective DRS spectra correctly transforms collective dynamics to local ones even in cases when the relaxation time trends are quite different. This correct transformation is possible due to structural information available in the DRS amplitude in a Kivelsen-Madden like context.

16.
J Phys Chem Lett ; 11(6): 2165-2170, 2020 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-32105075

RESUMO

Fast-field-cycling relaxometry is a nuclear magnetic resonance method growing in popularity; yet, theoretical interpretation is limited to analytical models of uncertain accuracy. We present the first study calculating fast-field-cycling dipolar coupling directly from a molecular dynamics simulation trajectory. In principle, the frequency-resolved dispersion contains both rotational and translational diffusion information, among others. The present joint experimental/molecular dynamics study demonstrates that nuclear magnetic resonance properties calculated from the latter reproduce measured dispersion curves and temperature trends faithfully. Furthermore, molecular dynamics simulations can verify interpretation model assumptions by providing actual diffusion coefficients and correlation times.

17.
Chem Sci ; 12(6): 2198-2208, 2020 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-34163985

RESUMO

Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based on ab initio calculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C-H functionalization, aromatic C-X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings.

18.
J Chem Phys ; 152(9): 094105, 2020 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-33480729

RESUMO

Ionic liquids are an interesting class of soft matter with viscosities of one or two orders of magnitude higher than that of water. Unfortunately, classical, non-polarizable molecular dynamics (MD) simulations of ionic liquids result in too slow dynamics and demonstrate the need for explicit inclusion of polarizability. The inclusion of polarizability, here via the Drude oscillator model, requires amendments to the employed thermostat, where we consider a dual Nosé-Hoover thermostat, as well as a dual Langevin thermostat. We investigate the effects of the choice of a thermostat and the underlying parameters such as the masses and force constants of the Drude particles on static and dynamic properties of ionic liquids. Here, we show that Langevin thermostats are not suitable for investigating the dynamics of ionic liquids. Since polarizable MD simulations are associated with high computational costs, we employed a self-developed graphics processing unit enhanced code within the MD program CHARMM to keep the overall computational effort reasonable.

19.
Phys Chem Chem Phys ; 21(32): 17703-17710, 2019 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-31367711

RESUMO

The inclusion of explicit polarization in molecular dynamics simulation has gained increasing interest during the last several years. An understudied area is the role of polarizability in computer simulations of solvation dynamics around chromophores, particularly for the large solutes used in experimental studies. In this work, we present fully polarizable ground and excited state force fields for the common fluorophores N-methyl-6-oxyquinolium betaine and Coumarin 153. While analyzing the solvation responses in water, methanol, and the highly viscous ionic liquid 1-ethyl-3-methylimidazolium trifluoromethanesulfonate we found that the inclusion of solute polarizability considerably increases the agreement of the obtained Stokes shift relaxation functions with experimental data. Solute polarizability slows down the inertial solvation response in the femtosecond time regime and enables the chromophore to adapt its dipole moment to the environment. Furthermore, the developed chromophore force field reproduces the solute dipole moments in both the electronic ground and excited state in environments ranging from gas phase to highly polar media correctly. Based on these studies it is anticipated that polarizable models of chromophores will lead to an improved understanding of the relationship of their environment to their spectroscopic properties.

20.
J Chem Phys ; 150(17): 175102, 2019 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-31067863

RESUMO

The bioprotective nature of monosaccharides and disaccharides is often attributed to their ability to slow down the dynamics of adjacent water molecules. Indeed, solvation dynamics close to sugars is indisputably retarded compared to bulk water. However, further research is needed on the qualitative and quantitative differences between the water dynamics around different saccharides. Current studies on this topic disagree on whether the disaccharide trehalose retards water to a larger extent than other isomers. Based on molecular dynamics simulation of the time-dependent Stokes shift of a chromophore close to the saccharides trehalose, sucrose, maltose, and glucose, this study reports a slightly stronger retardation of trehalose compared to other sugars at room temperature and below. Calculation and analysis of the intermolecular nuclear Overhauser effect, nuclear quadrupole relaxation, dielectric relaxation spectroscopy, and first shell residence times at room temperature yield further insights into the hydration dynamics of different sugars and confirm that trehalose slows down water dynamics to a slightly larger extent than other sugars. Since the calculated observables span a wide range of timescales relevant to intermolecular nuclear motion, and correspond to different kinds of motions, this study allows for a comprehensive view on sugar hydration dynamics.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA