Your browser doesn't support javascript.
loading
SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.
McGibbon, Miles; Money-Kyrle, Sam; Blay, Vincent; Houston, Douglas R.
Afiliação
  • McGibbon M; Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
  • Money-Kyrle S; Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
  • Blay V; Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain. Electronic address: vroger@ucsc.edu.
  • Houston DR; Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK. Electronic address: DouglasR.Houston@ed.ac.uk.
J Adv Res ; 46: 135-147, 2023 04.
Article em En | MEDLINE | ID: mdl-35901959
ABSTRACT

INTRODUCTION:

The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions.

OBJECTIVES:

To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening.

METHODS:

A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance.

RESULTS:

We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1 % enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets.

CONCLUSION:

By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas / Aprendizado de Máquina Idioma: En Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas / Aprendizado de Máquina Idioma: En Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Reino Unido