DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design.

García-Ortegón, Miguel; Simm, Gregor N C; Tripp, Austin J; Hernández-Lobato, José Miguel; Bender, Andreas; Bacallado, Sergio

García-Ortegón, Miguel; Simm, Gregor N C; Tripp, Austin J; Hernández-Lobato, José Miguel; Bender, Andreas; Bacallado, Sergio.

Afiliação

García-Ortegón M; Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom.
Simm GNC; Department of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom.
Tripp AJ; Department of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom.
Hernández-Lobato JM; Department of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom.
Bender A; Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd., Cambridge CB2 1EW, United Kingdom.
Bacallado S; Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom.

J Chem Inf Model ; 62(15): 3486-3502, 2022 08 08.

Article em En | MEDLINE | ID: mdl-35849793

ABSTRACT

ABSTRACT

The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate compound's interaction with the target. By contrast, molecular docking is a widely applied method in drug discovery to estimate binding affinities. However, docking studies require a significant amount of domain knowledge to set up correctly, which hampers adoption. Here, we present dockstring, a bundle for meaningful and robust comparison of ML models using docking scores. dockstring consists of three components (1) an open-source Python package for straightforward computation of docking scores, (2) an extensive dataset of docking scores and poses of more than 260,000 molecules for 58 medically relevant targets, and (3) a set of pharmaceutically relevant benchmark tasks such as virtual screening or de novo design of selective kinase inhibitors. The Python package implements a robust ligand and target preparation protocol that allows nonexperts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.

Assuntos

Benchmarking; Proteínas; Desenho de Fármacos; Ligantes; Simulação de Acoplamento Molecular; Ligação Proteica; Proteínas/química

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Benchmarking Tipo de estudo: Prognostic_studies Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google