Búsqueda | Portal Regional de la BVS

Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships.

Sheridan, Robert P; Wang, Wei Min; Liaw, Andy; Ma, Junshui; Gifford, Eric M.

J Chem Inf Model ; 56(12): 2353-2360, 2016 12 27.

Artículo en Inglés | MEDLINE | ID: mdl-27958738

RESUMEN

In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard parameters at which XGBoost makes predictions, on the average, better than those of random forest and almost as good as those of deep neural nets. The biggest strength of XGBoost is its speed. Whereas efficient use of random forest requires generating each tree in parallel on a cluster, and deep neural nets are usually run on GPUs, XGBoost can be run on a single CPU in less than a third of the wall-clock time of either of the other methods.

Asunto(s)

Relación Estructura-Actividad Cuantitativa , Algoritmos , Bases de Datos Farmacéuticas , Descubrimiento de Drogas , Humanos , Modelos Biológicos , Programas Informáticos

Systems chemical biology and the Semantic Web: what they mean for the future of drug discovery research.

Wild, David J; Ding, Ying; Sheth, Amit P; Harland, Lee; Gifford, Eric M; Lajiness, Michael S.

Drug Discov Today ; 17(9-10): 469-74, 2012 May.

Artículo en Inglés | MEDLINE | ID: mdl-22222943

RESUMEN

Systems chemical biology, the integration of chemistry, biology and computation to generate understanding about the way small molecules affect biological systems as a whole, as well as related fields such as chemogenomics, are central to emerging new paradigms of drug discovery such as drug repurposing and personalized medicine. Recent Semantic Web technologies such as RDF and SPARQL are technical enablers of systems chemical biology, facilitating the deployment of advanced algorithms for searching and mining large integrated datasets. In this paper, we aim to demonstrate how these technologies together can change the way that drug discovery is accomplished.

Asunto(s)

Descubrimiento de Drogas , Biología de Sistemas/métodos , Algoritmos , Humanos , Internet , Semántica

Comparing bioassay response and similarity ensemble approaches to probing protein pharmacology.

Chen, Bin; McConnell, Kevin J; Wale, Nikil; Wild, David J; Gifford, Eric M.

Bioinformatics ; 27(21): 3044-9, 2011 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-21903625

RESUMEN

MOTIVATION: Networks to predict protein pharmacology can be created using ligand similarity or using known bioassay response profiles of ligands. Recent publications indicate that similarity methods can be highly accurate, but it has been unclear how similarity methods compare to methods that use bioassay response data directly. RESULTS: We created protein networks based on ligand similarity (Similarity Ensemble Approach or SEA) and ligand bioassay response-data (BARD) using 155 Pfizer internal BioPrint assays. Both SEA and BARD successfully cluster together proteins with known relationships, and predict some non-obvious relationships. Although the approaches assess target relations from different perspectives, their networks overlap considerably (40% overlap of the top 2% of correlated edges). They can thus be considered as comparable methods, with a distinct advantage of the similarity methods that they only require simple computations (similarity of compound) as opposed to extensive experimental data. CONTACTS: djwild@indiana.edu; eric.gifford@pfizer.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Diseño de Fármacos , Proteínas/química , Proteínas/metabolismo , Bioensayo , Análisis por Conglomerados , Ligandos , Mapas de Interacción de Proteínas

Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties.

Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean.

Drug Metab Dispos ; 38(11): 2083-90, 2010 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-20693417

RESUMEN

Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to â¼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.

Asunto(s)

Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Modelos Biológicos , Preparaciones Farmacéuticas/metabolismo , Programas Informáticos , Toxicología/métodos , Absorción , Algoritmos , Simulación por Computador , Estabilidad de Medicamentos , Humanos , Microsomas Hepáticos/metabolismo , Preparaciones Farmacéuticas/química , Valor Predictivo de las Pruebas , Solubilidad , Distribución Tisular

The development and validation of a computational model to predict rat liver microsomal clearance.

Chang, Cheng; Duignan, David B; Johnson, Kjell D; Lee, Pil H; Cowan, George S; Gifford, Eric M; Stankovic, Charles J; Lepsy, Christopher S; Stoner, Chad L.

J Pharm Sci ; 98(8): 2857-67, 2009 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-19116953

RESUMEN

As the cost of discovering and developing new pharmaceutically relevant compounds continues to rise, it is increasingly important to select the right molecules to prosecute very early in drug discovery. The development of high throughput in vitro assays of hepatic metabolic clearance has allowed for vast quantities of data generation; however, these large screens are still costly and remain dependant on animal usage. To further expand the value of these screens and ultimately aid in animal usage reduction, we have developed an in silico model of rat liver microsomal (RLM) clearance. This model combines a large amount of rat clearance data (n = 27,697) generated at multiple Pfizer laboratories to represent the broadest possible chemistry space. The model predicts RLM stability (with 82% accuracy and a kappa value of 0.65 for test data set) based solely on chemical structural inputs, and provides a clear assessment of confidence in the prediction. The current in silico model should help accelerate the drug discovery process by using confidence-based stability-driven prioritization, and reduce cost by filtering out the most unstable/undesirable molecules. The model can also increase efficiency in the evaluation of chemical series by optimizing iterative testing and promoting rational drug design.

Asunto(s)

Biología Computacional/métodos , Biología Computacional/normas , Microsomas Hepáticos/metabolismo , Modelos Biológicos , Animales , Tasa de Depuración Metabólica/efectos de los fármacos , Valor Predictivo de las Pruebas , Ratas

Development of CYP3A4 inhibition models: comparisons of machine-learning techniques and molecular descriptors.

Arimoto, Rieko; Prasad, Madhu-Ashni; Gifford, Eric M.

J Biomol Screen ; 10(3): 197-205, 2005 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-15809315

RESUMEN

Computational models of cytochrome P450 3A4 inhibition were developed based on high-throughput screening data for 4470 proprietary compounds. Multiple models differentiating inhibitors (IC(50) <3 microM) and noninhibitors were generated using various machine-learning algorithms (recursive partitioning [RP], Bayesian classifier, logistic regression, k-nearest-neighbor, and support vector machine [SVM]) with structural fingerprints and topological indices. Nineteen models were evaluated by internal 10-fold cross-validation and also by an independent test set. Three most predictive models, Barnard Chemical Information (BCI)-fingerprint/SVM, MDL-keyset/SVM, and topological indices/RP, correctly classified 249, 248, and 236 compounds of 291 noninhibitors and 135, 137, and 147 compounds of 179 inhibitors in the validation set. Their overall accuracies were 82%, 82%, and 81%, respectively. Investigating applicability of the BCI/SVM model found a strong correlation between the predictive performance and the structural similarity to the training set. Using Tanimoto similarity index as a confidence measurement for the predictions, the limitation of the extrapolation was 0.7 in the case of the BCI/SVM model. Taking consensus of the 3 best models yielded a further improvement in predictive capability, kappa = 0.65 and accuracy = 83%. The consensus model could also be tuned to minimize either false positives or false negatives depending on the emphasis of the screening.

Asunto(s)

Inteligencia Artificial , Inhibidores Enzimáticos del Citocromo P-450 , Evaluación Preclínica de Medicamentos/métodos , Inhibidores Enzimáticos/química , Modelos Químicos , Simulación por Computador , Citocromo P-450 CYP3A , Inhibidores Enzimáticos/farmacología , Humanos , Modelos Moleculares , Estructura Molecular

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA