Your browser doesn't support javascript.
loading
Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening.
Cao, Zhonglin; Sciabola, Simone; Wang, Ye.
Afiliação
  • Cao Z; Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States.
  • Sciabola S; Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States.
  • Wang Y; Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States.
J Chem Inf Model ; 64(6): 1882-1891, 2024 03 25.
Article em En | MEDLINE | ID: mdl-38442000
ABSTRACT
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Bibliotecas de Moléculas Pequenas / Descoberta de Drogas Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Bibliotecas de Moléculas Pequenas / Descoberta de Drogas Idioma: En Ano de publicação: 2024 Tipo de documento: Article