Your browser doesn't support javascript.
loading
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery.
Minnich, Amanda J; McLoughlin, Kevin; Tse, Margaret; Deng, Jason; Weber, Andrew; Murad, Neha; Madej, Benjamin D; Ramsundar, Bharath; Rush, Tom; Calad-Thomson, Stacie; Brase, Jim; Allen, Jonathan E.
Affiliation
  • Minnich AJ; Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States.
  • McLoughlin K; Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States.
  • Tse M; GlaxoSmithKline, 5 Crescent Drive Philadelphia Pennsylvania 19112, United States.
  • Deng J; GlaxoSmithKline, 5 Crescent Drive Philadelphia Pennsylvania 19112, United States.
  • Weber A; GlaxoSmithKline, 5 Crescent Drive Philadelphia Pennsylvania 19112, United States.
  • Murad N; GlaxoSmithKline, 5 Crescent Drive Philadelphia Pennsylvania 19112, United States.
  • Madej BD; Frederick National Laboratory, 8560 Progress Drive, Frederick, Maryland 21701, United States.
  • Ramsundar B; Computable, San Francisco, California 94111, United States.
  • Rush T; GlaxoSmithKline, 5 Crescent Drive Philadelphia Pennsylvania 19112, United States.
  • Calad-Thomson S; GlaxoSmithKline, 5 Crescent Drive Philadelphia Pennsylvania 19112, United States.
  • Brase J; Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States.
  • Allen JE; Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States.
J Chem Inf Model ; 60(4): 1955-1968, 2020 04 27.
Article in En | MEDLINE | ID: mdl-32243153
ABSTRACT
One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of ML and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical data sets covering a wide range of parameters. Our key findings indicate that traditional molecular fingerprints underperform other feature representation methods. We also find that data set size correlates directly with prediction performance, which points to the need to expand public data sets. Uncertainty quantification can help predict model error, but correlation with error varies considerably between data sets and model types. Our findings point to the need for an extensible pipeline that can be shared to make model building more widely accessible and reproducible. This software is open source and available at https//github.com/ATOMconsortium/AMPL.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Drug Discovery Type of study: Prognostic_studies Language: En Journal: J Chem Inf Model Journal subject: INFORMATICA MEDICA / QUIMICA Year: 2020 Document type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Drug Discovery Type of study: Prognostic_studies Language: En Journal: J Chem Inf Model Journal subject: INFORMATICA MEDICA / QUIMICA Year: 2020 Document type: Article Affiliation country: United States
...