Your browser doesn't support javascript.
loading
Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models.
Conn, Jonathan G M; Carter, James W; Conn, Justin J A; Subramanian, Vigneshwari; Baxter, Andrew; Engkvist, Ola; Llinas, Antonio; Ratkova, Ekaterina L; Pickett, Stephen D; McDonagh, James L; Palmer, David S.
Afiliación
  • Conn JGM; Department of Pure and Applied Chemistry, University of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
  • Carter JW; Department of Pure and Applied Chemistry, University of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
  • Conn JJA; Department of Pure and Applied Chemistry, University of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
  • Subramanian V; Drug Metabolism and Pharmacokinetics, Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca, Pepparedsleden 1, SE-431 83 Göteborg, Sweden.
  • Baxter A; GSK Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, U.K.
  • Engkvist O; Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, SE-431 50 Göteborg, Sweden.
  • Llinas A; Department of Computer Science and Engineering, Chalmers University of Technology, SE-412 96 Göteborg, Sweden.
  • Ratkova EL; Drug Metabolism and Pharmacokinetics, Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca, Pepparedsleden 1, SE-431 83 Göteborg, Sweden.
  • Pickett SD; Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, SE-431 50 Göteborg, Sweden.
  • McDonagh JL; Computational Sciences, GlaxoSmithKline R&D Pharmaceuticals, Stevenage SG1 2NY, U.K.
  • Palmer DS; IBM Research Europe, Hartree Centre, SciTech Daresbury, Warrington, Cheshire WA4 4AD, U.K.
J Chem Inf Model ; 63(4): 1099-1113, 2023 02 27.
Article en En | MEDLINE | ID: mdl-36758178
ABSTRACT
Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge" in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets.
Asunto(s)

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Aprendizaje Profundo Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Chem Inf Model Asunto de la revista: INFORMATICA MEDICA / QUIMICA Año: 2023 Tipo del documento: Article

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Aprendizaje Profundo Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Chem Inf Model Asunto de la revista: INFORMATICA MEDICA / QUIMICA Año: 2023 Tipo del documento: Article