Benchmarking Active Learning Protocols for Ligand-Binding Affinity Prediction.

Gorantla, Rohan; Kubincová, Alzbeta; Suutari, Benjamin; Cossins, Benjamin P; Mey, Antonia S J S

Gorantla, Rohan; Kubincová, Alzbeta; Suutari, Benjamin; Cossins, Benjamin P; Mey, Antonia S J S.

Afiliação

Gorantla R; School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, U.K.
Kubincová A; EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, U.K.
Suutari B; Exscientia, Schrödinger Building, Oxford OX4 4GE, U.K.
Cossins BP; Exscientia, Schrödinger Building, Oxford OX4 4GE, U.K.
Mey ASJS; Exscientia, Schrödinger Building, Oxford OX4 4GE, U.K.

J Chem Inf Model ; 64(6): 1955-1965, 2024 03 25.

Article em En | MEDLINE | ID: mdl-38446131

ABSTRACT

ABSTRACT

Active learning (AL) has become a powerful tool in computational drug discovery, enabling the identification of top binders from vast molecular libraries. To design a robust AL protocol, it is important to understand the influence of AL parameters, as well as the features of the data sets on the outcomes. We use four affinity data sets for different targets (TYK2, USP7, D2R, Mpro) to systematically evaluate the performance of machine learning models [Gaussian process (GP) model and Chemprop model], sample selection protocols, and the batch size based on metrics describing the overall predictive power of the model (R2, Spearman rank, root-mean-square error) as well as the accurate identification of top 2%/5% binders (Recall, F1 score). Both models have a comparable Recall of top binders on large data sets, but the GP model surpasses the Chemprop model when training data are sparse. A larger initial batch size, especially on diverse data sets, increased the Recall of both models as well as overall correlation metrics. However, for subsequent cycles, smaller batch sizes of 20 or 30 compounds proved to be desirable. Furthermore, adding artificial Gaussian noise to the data up to a certain threshold still allowed the model to identify clusters with top-scoring compounds. However, excessive noise (<1σ) did impact the model's predictive and exploitative capabilities.

Assuntos

Benchmarking; Aprendizado de Máquina; Ligantes; Descoberta de Drogas/métodos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Benchmarking / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Benchmarking / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article