Machine learning classification can reduce false positives in structure-based virtual screening.

Adeshina, Yusuf O; Deeds, Eric J; Karanicolas, John

Adeshina, Yusuf O; Deeds, Eric J; Karanicolas, John.

Afiliação

Adeshina YO; Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111.
Deeds EJ; Center for Computational Biology, University of Kansas, Lawrence, KS 66045.
Karanicolas J; Center for Computational Biology, University of Kansas, Lawrence, KS 66045.

Proc Natl Acad Sci U S A ; 117(31): 18477-18488, 2020 08 04.

Article em En | MEDLINE | ID: mdl-32669436

ABSTRACT

ABSTRACT

With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery's search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC50 better than 50 µM. Without any medicinal chemistry optimization, the most potent hit has IC50 280 nM, corresponding to Ki of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.

Assuntos

Avaliação Pré-Clínica de Medicamentos/métodos; Aprendizado de Máquina; Bibliotecas de Moléculas Pequenas/farmacologia; Bases de Dados de Proteínas; Descoberta de Drogas; Avaliação Pré-Clínica de Medicamentos/instrumentação; Humanos

Palavras-chave

machine learning classifier; proteinligand complex; structure-based drug design; virtual screening

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Avaliação Pré-Clínica de Medicamentos / Bibliotecas de Moléculas Pequenas / Aprendizado de Máquina Tipo de estudo: Diagnostic_studies / Evaluation_studies / Prognostic_studies / Screening_studies Limite: Humans Idioma: En Revista: Proc Natl Acad Sci U S A Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google