Your browser doesn't support javascript.
loading
Experimental Error, Kurtosis, Activity Cliffs, and Methodology: What Limits the Predictivity of Quantitative Structure-Activity Relationship Models?
Sheridan, Robert P; Karnachi, Prabha; Tudor, Matthew; Xu, Yuting; Liaw, Andy; Shah, Falgun; Cheng, Alan C; Joshi, Elizabeth; Glick, Meir; Alvarez, Juan.
Afiliação
  • Sheridan RP; Computational and Structural Chemistry, Merck & Company Inc., Kenilworth, New Jersey 07033, United States.
  • Karnachi P; Computational and Structural Chemistry, Merck & Company Inc., Kenilworth, New Jersey 07033, United States.
  • Tudor M; Computational and Structural Chemistry, Merck & Company Inc., West Point, Pennsylvania 19486, United States.
  • Xu Y; Biometrics Research, Merck & Company Inc., Rahway, New Jersey 07065, United States.
  • Liaw A; Biometrics Research, Merck & Company Inc., Rahway, New Jersey 07065, United States.
  • Shah F; Computational and Structural Chemistry, Merck & Company Inc., West Point, Pennsylvania 19486, United States.
  • Cheng AC; Computational and Structural Chemistry, Merck & Company Inc., South San Francisco, California 94080, United States.
  • Joshi E; Pharmacokinetics, Pharmacodynamics & Drug Metabolism, Merck & Company Inc., West Point, Pennsylvania 19486, United States.
  • Glick M; Computational and Structural Chemistry, Merck & Company Inc., Boston, Massachusetts 02115, United States.
  • Alvarez J; Computational and Structural Chemistry, Merck & Company Inc., Boston, Massachusetts 02115, United States.
J Chem Inf Model ; 60(4): 1969-1982, 2020 04 27.
Article em En | MEDLINE | ID: mdl-32207612
ABSTRACT
Given a particular descriptor/method combination, some quantitative structure-activity relationship (QSAR) datasets are very predictive by random-split cross-validation while others are not. Recent literature in modelability suggests that the limiting issue for predictivity is in the data, not the QSAR methodology, and the limits are due to activity cliffs. Here, we investigate, on in-house data, the relative usefulness of experimental error, distribution of the activities, and activity cliff metrics in determining how predictive a dataset is likely to be. We include unmodified in-house datasets, datasets that should be perfectly predictive based only on the chemical structure, datasets where the distribution of activities is manipulated, and datasets that include a known amount of added noise. We find that activity cliff metrics determine predictivity better than the other metrics we investigated, whatever the type of dataset, consistent with the modelability literature. However, such metrics cannot distinguish real activity cliffs due to large uncertainties in the activities. We also show that a number of modern QSAR methods, and some alternative descriptors, are equally bad at predicting the activities of compounds on activity cliffs, consistent with the assumptions behind "modelability." Finally, we relate time-split predictivity with random-split predictivity and show that different coverages of chemical space are at least as important as uncertainty in activity and/or activity cliffs in limiting predictivity.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Relação Quantitativa Estrutura-Atividade / Erro Científico Experimental Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Relação Quantitativa Estrutura-Atividade / Erro Científico Experimental Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article