Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
J Chem Phys ; 153(2): 024112, 2020 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-32668927

RESUMEN

Discovering novel chemicals and materials can be greatly accelerated by iterative machine learning-informed proposal of candidates-active learning. However, standard global error metrics for model quality are not predictive of discovery performance and can be misleading. We introduce the notion of Pareto shell error to help judge the suitability of a model for proposing candidates. Furthermore, through synthetic cases, an experimental thermoelectric dataset and a computational organic molecule dataset, we probe the relation between acquisition function fidelity and active learning performance. Results suggest novel diagnostic tools, as well as new insights for the acquisition function design.

3.
J Chem Phys ; 150(20): 204121, 2019 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-31153160

RESUMEN

Instant machine learning predictions of molecular properties are desirable for materials design, but the predictive power of the methodology is mainly tested on well-known benchmark datasets. Here, we investigate the performance of machine learning with kernel ridge regression (KRR) for the prediction of molecular orbital energies on three large datasets: the standard QM9 small organic molecules set, amino acid and dipeptide conformers, and organic crystal-forming molecules extracted from the Cambridge Structural Database. We focus on the prediction of highest occupied molecular orbital (HOMO) energies, computed at the density-functional level of theory. Two different representations that encode the molecular structure are compared: the Coulomb matrix (CM) and the many-body tensor representation (MBTR). We find that KRR performance depends significantly on the chemistry of the underlying dataset and that the MBTR is superior to the CM, predicting HOMO energies with a mean absolute error as low as 0.09 eV. To demonstrate the power of our machine learning method, we apply our model to structures of 10k previously unseen molecules. We gain instant energy predictions that allow us to identify interesting molecules for future applications.

4.
J Chem Phys ; 148(24): 241401, 2018 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-29960312

RESUMEN

A survey of the contributions to the Special Topic on Data-enabled Theoretical Chemistry is given, including a glossary of relevant machine learning terms.

5.
PLoS Comput Biol ; 10(1): e1003400, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24453952

RESUMEN

Machine learning has been used for estimation of potential energy surfaces to speed up molecular dynamics simulations of small systems. We demonstrate that this approach is feasible for significantly larger, structurally complex molecules, taking the natural product Archazolid A, a potent inhibitor of vacuolar-type ATPase, from the myxobacterium Archangium gephyra as an example. Our model estimates energies of new conformations by exploiting information from previous calculations via Gaussian process regression. Predictive variance is used to assess whether a conformation is in the interpolation region, allowing a controlled trade-off between prediction accuracy and computational speed-up. For energies of relaxed conformations at the density functional level of theory (implicit solvent, DFT/BLYP-disp3/def2-TZVP), mean absolute errors of less than 1 kcal/mol were achieved. The study demonstrates that predictive machine learning models can be developed for structurally complex, pharmaceutically relevant compounds, potentially enabling considerable speed-ups in simulations of larger molecular structures.


Asunto(s)
Inteligencia Artificial , Inhibidores Enzimáticos/química , Macrólidos/química , Tiazoles/química , Adenosina Trifosfatasas/química , Algoritmos , Química Farmacéutica , Biología Computacional/métodos , Espectroscopía de Resonancia Magnética , Modelos Químicos , Simulación de Dinámica Molecular , Estructura Molecular , Myxococcales/metabolismo , Distribución Normal , Análisis de Componente Principal , Conformación Proteica , Programas Informáticos , Procesos Estocásticos
6.
PLoS Comput Biol ; 8(2): e1002380, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22359493

RESUMEN

We present a computational method for the reaction-based de novo design of drug-like molecules. The software DOGS (Design of Genuine Structures) features a ligand-based strategy for automated 'in silico' assembly of potentially novel bioactive compounds. The quality of the designed compounds is assessed by a graph kernel method measuring their similarity to known bioactive reference ligands in terms of structural and pharmacophoric features. We implemented a deterministic compound construction procedure that explicitly considers compound synthesizability, based on a compilation of 25'144 readily available synthetic building blocks and 58 established reaction principles. This enables the software to suggest a synthesis route for each designed compound. Two prospective case studies are presented together with details on the algorithm and its implementation. De novo designed ligand candidates for the human histamine H4 receptor and γ-secretase were synthesized as suggested by the software. The computational approach proved to be suitable for scaffold-hopping from known ligands to novel chemotypes, and for generating bioactive molecules with drug-like properties.


Asunto(s)
Biología Computacional/métodos , Diseño de Fármacos , Algoritmos , Secretasas de la Proteína Precursora del Amiloide/metabolismo , Automatización , Computadores , Humanos , Ligandos , Modelos Químicos , Modelos Estadísticos , Estructura Molecular , Receptores Acoplados a Proteínas G/química , Receptores Histamínicos/química , Receptores Histamínicos H4 , Programas Informáticos , Tecnología Farmacéutica
7.
J Chem Phys ; 139(22): 224104, 2013 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-24329053

RESUMEN

Using a one-dimensional model, we explore the ability of machine learning to approximate the non-interacting kinetic energy density functional of diatomics. This nonlinear interpolation between Kohn-Sham reference calculations can (i) accurately dissociate a diatomic, (ii) be systematically improved with increased reference data and (iii) generate accurate self-consistent densities via a projection method that avoids directions with no data. With relatively few densities, the error due to the interpolation is smaller than typical errors in standard exchange-correlation functionals.


Asunto(s)
Inteligencia Artificial , Teoría Cuántica , Algoritmos , Simulación por Computador
8.
Phys Rev Lett ; 108(5): 058301, 2012 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-22400967

RESUMEN

We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schrödinger equation is mapped onto a nonlinear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross validation over more than seven thousand organic molecules yields a mean absolute error of ∼10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.

9.
Phys Rev Lett ; 108(25): 253002, 2012 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-23004593

RESUMEN

Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of noninteracting fermions in 1D, mean absolute errors below 1 kcal/mol on test densities similar to the training set are reached with fewer than 100 training densities. A predictor identifies if a test density is within the interpolation region. Via principal component analysis, a projected functional derivative finds highly accurate self-consistent densities. The challenges for application of our method to real electronic structure problems are discussed.

10.
J Comput Aided Mol Des ; 26(7): 883-95, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22714263

RESUMEN

Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85% of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.


Asunto(s)
Aprendizaje , Farmacocinética , Modelos Estadísticos , Relación Estructura-Actividad Cuantitativa
11.
J Chem Phys ; 136(17): 174101, 2012 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-22583204

RESUMEN

We present a method for optimizing transition state theory dividing surfaces with support vector machines. The resulting dividing surfaces require no a priori information or intuition about reaction mechanisms. To generate optimal dividing surfaces, we apply a cycle of machine-learning and refinement of the surface by molecular dynamics sampling. We demonstrate that the machine-learned surfaces contain the relevant low-energy saddle points. The mechanisms of reactions may be extracted from the machine-learned surfaces in order to identify unexpected chemically relevant processes. Furthermore, we show that the machine-learned surfaces significantly increase the transmission coefficient for an adatom exchange involving many coupled degrees of freedom on a (100) surface when compared to a distance-based dividing surface.


Asunto(s)
Algoritmos , Inteligencia Artificial , Biología Computacional , Máquina de Vectores de Soporte , Simulación de Dinámica Molecular , Programas Informáticos , Propiedades de Superficie
12.
J Comput Aided Mol Des ; 25(6): 533-54, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21660515

RESUMEN

The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.


Asunto(s)
Bases de Datos Factuales , Internet , Modelos Químicos , Difusión de la Información , Gestión de la Información , Relación Estructura-Actividad Cuantitativa , Interfaz Usuario-Computador
13.
J Comput Chem ; 31(15): 2810-26, 2010 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-20839306

RESUMEN

Previously, (Hähnke et al., J Comput Chem 2009, 30, 761) we presented the Pharmacophore Alignment Search Tool (PhAST), a ligand-based virtual screening technique representing molecules as strings coding pharmacophoric features and comparing them by global pairwise sequence alignment. To guarantee unambiguity during the reduction of two-dimensional molecular graphs to one-dimensional strings, PhAST employs a graph canonization step. Here, we present the results of the comparison of 11 different algorithms for graph canonization with respect to their impact on virtual screening. Retrospective screenings of a drug-like data set were evaluated using the BEDROC metric, which yielded averaged values between 0.4 and 0.14 for the best-performing and worst-performing canonization technique. We compared five scoring schemes for the alignments and found preferred combinations of canonization algorithms and scoring functions. Finally, we introduce a performance index that helps prioritize canonization approaches without the need for extensive retrospective evaluation.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Descubrimiento de Drogas/métodos , Alineación de Secuencia/métodos , Bibliotecas de Moléculas Pequeñas , Algoritmos , Análisis de Componente Principal
14.
Bioorg Med Chem Lett ; 20(9): 2920-3, 2010 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-20347594

RESUMEN

In previous studies, we identified a truxillic acid derivative as selective activator of the peroxisome proliferator-activated receptor gamma, which is a member of the nuclear receptor family and acts as ligand-activated transcription factor of genes involved in glucose metabolism. Herein we present the structure-activity relationships of 16 truxillic acid derivatives, investigated by a cell-based reporter gene assay guided by molecular docking analysis.


Asunto(s)
Ciclobutanos/química , Hipoglucemiantes/química , PPAR gamma/agonistas , Sitios de Unión , Simulación por Computador , Ciclobutanos/síntesis química , Ciclobutanos/farmacología , Glucosa/metabolismo , Humanos , Hipoglucemiantes/síntesis química , Hipoglucemiantes/farmacología , PPAR gamma/metabolismo , Relación Estructura-Actividad
15.
Nat Commun ; 11(1): 4428, 2020 09 04.
Artículo en Inglés | MEDLINE | ID: mdl-32887879

RESUMEN

Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the average model test error. This can render different models indistinguishable although their performance differs substantially across materials, or it can make a model appear generally insufficient while it actually works well in specific sub-domains. Here, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of models within a materials class. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides. We find that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance.

16.
J Comput Chem ; 30(14): 2285-96, 2009 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-19266481

RESUMEN

Measuring the (dis)similarity of molecules is important for many cheminformatics applications like compound ranking, clustering, and property prediction. In this work, we focus on real-valued vector representations of molecules (as opposed to the binary spaces of fingerprints). We demonstrate the influence which the choice of (dis)similarity measure can have on results, and provide recommendations for such choices. We review the mathematical concepts used to measure (dis)similarity in vector spaces, namely norms, metrics, inner products, and, similarity coefficients, as well as the relationships between them, employing (dis)similarity measures commonly used in cheminformatics as examples. We present several phenomena (empty space phenomenon, sphere volume related phenomena, distance concentration) in high-dimensional descriptor spaces which are not encountered in two and three dimensions. These phenomena are theoretically characterized and illustrated on both artificial and real (bioactivity) data.


Asunto(s)
Biología Computacional , Preparaciones Farmacéuticas/química , Bases de Datos Factuales , Estructura Molecular
17.
J Chem Theory Comput ; 11(5): 2087-96, 2015 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-26574412

RESUMEN

Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy.


Asunto(s)
Aprendizaje Automático , Teoría Cuántica , Electrones , Isomerismo , Cetonas/química , Termodinámica
18.
Sci Data ; 1: 140022, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25977779

RESUMEN

Computational de novo design of new drugs and materials requires rigorous and unbiased exploration of chemical compound space. However, large uncharted territories persist due to its size scaling combinatorially with molecular size. We report computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of CHONF. These molecules correspond to the subset of all 133,885 species with up to nine heavy atoms (CONF) out of the GDB-17 chemical universe of 166 billion organic molecules. We report geometries minimal in energy, corresponding harmonic frequencies, dipole moments, polarizabilities, along with energies, enthalpies, and free energies of atomization. All properties were calculated at the B3LYP/6-31G(2df,p) level of quantum chemistry. Furthermore, for the predominant stoichiometry, C7H10O2, there are 6,095 constitutional isomers among the 134k molecules. We report energies, enthalpies, and free energies of atomization at the more accurate G4MP2 level of theory for all of them. As such, this data set provides quantum chemical properties for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.


Asunto(s)
Modelos Químicos , Modelos Moleculares , Compuestos Orgánicos , Bibliotecas de Moléculas Pequeñas , Bases de Datos Factuales , Estructura Molecular , Termodinámica
19.
Mol Inform ; 32(7): 625-46, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-27481770

RESUMEN

Previously, we proposed a ligand-based virtual screening technique (PhAST) based on global alignment of linearized interaction patterns. Here, we applied techniques developed for similarity assessment in local sequence alignments to our method resulting in p-values for chemical similarity. We compared two sampling strategies, a simple sampling strategy and a Markov Chain Monte Carlo (MCMC) method, and investigated the similarity of sampled distributions to Gaussian, Gumbel, modified Gumbel, and Gamma distributions. The Gumbel distribution with a Gaussian correction term was identified as the most similar to the observed empirical distributions. These techniques were applied in retrospective screenings on a drug-like dataset. Obtained p-values were adjusted to the size of the screening library with four different methods. Evaluation of E-value thresholds corroborated the Bonferroni correction as a preferred means to identify significant chemical similarity with PhAST. An online version of PhAST with significance estimation is available at http://modlab-cadd.ethz.ch/.

20.
J Chem Theory Comput ; 9(8): 3404-19, 2013 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-26584096

RESUMEN

The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA