Your browser doesn't support javascript.
loading
FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology.
Burger, Pieter B; Hu, Xiaohu; Balabin, Ilya; Muller, Morné; Stanley, Megan; Joubert, Fourie; Kaiser, Thomas M.
Afiliação
  • Burger PB; Avicenna Biosciences Inc., 101 W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States.
  • Hu X; Schrödinger, Inc., 120 West 45th Street, New York, New York 10036, United States.
  • Balabin I; Avicenna Biosciences Inc., 101 W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States.
  • Muller M; Avicenna Biosciences Inc., 101 W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States.
  • Stanley M; Microsoft Research AI4Science, 21 Station Road, Cambridge CB1 2FB, U.K.
  • Joubert F; Centre for Bioinformatics and Computational Biology, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0001, South Africa.
  • Kaiser TM; Avicenna Biosciences Inc., 101 W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States.
J Chem Inf Model ; 64(9): 3812-3825, 2024 May 13.
Article em En | MEDLINE | ID: mdl-38651738
ABSTRACT
In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist's toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML's capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article