Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
J Chem Inf Model ; 49(9): 2077-81, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19702240

RESUMEN

Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.


Asunto(s)
Benchmarking , Biología Computacional , Bases de Datos Factuales , Pruebas de Mutagenicidad/métodos , Inteligencia Artificial , Pruebas de Mutagenicidad/normas , Mutágenos/química , Mutágenos/toxicidad , Distribución Normal , Salmonella typhimurium/efectos de los fármacos , Salmonella typhimurium/genética , Relación Estructura-Actividad
2.
Comb Chem High Throughput Screen ; 12(5): 453-68, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19519325

RESUMEN

A large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we will also discuss how confidence estimates (error bars) can be obtained from these methods. We continue with important aspects for model building and evaluation, such as methodologies for model selection, evaluation, performance criteria, and how the quality of error bar estimates can be verified. Besides an introduction to the respective methods, we will also point to available implementations, and discuss important issues for the practical application.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Modelos Estadísticos , Algoritmos , Simulación por Computador , Árboles de Decisión , Ligandos , Modelos Químicos , Distribución Normal , Relación Estructura-Actividad Cuantitativa
3.
J Chem Inf Model ; 49(6): 1486-96, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19435326

RESUMEN

In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 +/- 0.01 to 0.57 +/- 0.01 using a local bias correction strategy.


Asunto(s)
Canales de Potasio Éter-A-Go-Go/antagonistas & inhibidores , Relación Estructura-Actividad Cuantitativa , Evaluación Preclínica de Medicamentos , Humanos , Concentración 50 Inhibidora , Redes Neurales de la Computación , Bloqueadores de los Canales de Potasio/química , Bloqueadores de los Canales de Potasio/farmacología , Análisis de Regresión , Reproducibilidad de los Resultados
4.
J Chem Inf Model ; 48(4): 785-96, 2008 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-18327900

RESUMEN

Metabolic stability is an important property of drug molecules that should-optimally-be taken into account early on in the drug design process. Along with numerous medium- or high-throughput assays being implemented in early drug discovery, a prediction tool for this property could be of high value. However, metabolic stability is inherently difficult to predict, and no commercial tools are available for this purpose. In this work, we present a machine learning approach to predicting metabolic stability that is tailored to compounds from the drug development process at Bayer Schering Pharma. For four different in vitro assays, we develop Bayesian classification models to predict the probability of a compound being metabolically stable. The chosen approach implicitly takes the "domain of applicability" into account. The developed models were validated on recent project data at Bayer Schering Pharma, showing that the predictions are highly accurate and the domain of applicability is estimated correctly. Furthermore, we evaluate the modeling method on a set of publicly available data.


Asunto(s)
Probabilidad , Algoritmos , Teorema de Bayes , Diseño de Fármacos
5.
J Comput Aided Mol Des ; 21(12): 651-64, 2007 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-18060505

RESUMEN

We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.


Asunto(s)
Inteligencia Artificial , Preparaciones Farmacéuticas/química , Relación Estructura-Actividad Cuantitativa , Agua/química , Algoritmos , Diseño de Fármacos , Solubilidad
6.
Mol Pharm ; 4(4): 524-38, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17637064

RESUMEN

Unfavorable lipophilicity and water solubility cause many drug failures; therefore these properties have to be taken into account early on in lead discovery. Commercial tools for predicting lipophilicity usually have been trained on small and neutral molecules, and are thus often unable to accurately predict in-house data. Using a modern Bayesian machine learning algorithm--a Gaussian process model--this study constructs a log D7 model based on 14,556 drug discovery compounds of Bayer Schering Pharma. Performance is compared with support vector machines, decision trees, ridge regression, and four commercial tools. In a blind test on 7013 new measurements from the last months (including compounds from new projects) 81% were predicted correctly within 1 log unit, compared to only 44% achieved by commercial software. Additional evaluations using public data are presented. We consider error bars for each method (model based error bars, ensemble based, and distance based approaches), and investigate how well they quantify the domain of applicability of each model.


Asunto(s)
Inteligencia Artificial , Lípidos/química , Modelos Químicos , Preparaciones Farmacéuticas/química , Algoritmos , Teorema de Bayes , Árboles de Decisión , Modelos Estadísticos , Estructura Molecular , Reproducibilidad de los Resultados
7.
J Comput Aided Mol Des ; 21(9): 485-98, 2007 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-17632688

RESUMEN

We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.


Asunto(s)
Inteligencia Artificial , Modelos Químicos , Preparaciones Farmacéuticas/química , Relación Estructura-Actividad Cuantitativa , Algoritmos , Teorema de Bayes , Modelos Estadísticos , Estructura Molecular , Solubilidad
9.
J Chem Inf Model ; 47(2): 407-24, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17243756

RESUMEN

Accurate in silico models for predicting aqueous solubility are needed in drug design and discovery and many other areas of chemical research. We present a statistical modeling of aqueous solubility based on measured data, using a Gaussian Process nonlinear regression model (GPsol). We compare our results with those of 14 scientific studies and 6 commercial tools. This shows that the developed model achieves much higher accuracy than available commercial tools for the prediction of solubility of electrolytes. On top of the high accuracy, the proposed machine learning model also provides error bars for each individual prediction.


Asunto(s)
Modelos Químicos , Redes Neurales de la Computación , Simulación por Computador , Electrólitos , Estructura Molecular , Solubilidad
10.
J Chem Inf Model ; 45(2): 249-53, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-15807485

RESUMEN

In this article we report about a successful application of modern machine learning technology, namely Support Vector Machines, to the problem of assessing the 'drug-likeness' of a chemical from a given set of descriptors of the substance. We were able to drastically improve the recent result by Byvatov et al. (2003) on this task and achieved an error rate of about 7% on unseen compounds using Support Vector Machines. We see a very high potential of such machine learning techniques for a variety of computational chemistry problems that occur in the drug discovery and drug design process.


Asunto(s)
Inteligencia Artificial , Diseño de Fármacos , Simulación por Computador , Modelos Químicos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA