Insights into performance evaluation of compound-protein interaction prediction methods.

Yaseen, Adiba; Amin, Imran; Akhter, Naeem; Ben-Hur, Asa; Minhas, Fayyaz

Yaseen, Adiba; Amin, Imran; Akhter, Naeem; Ben-Hur, Asa; Minhas, Fayyaz.

Afiliação

Yaseen A; Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan.
Amin I; National Institute for Biotechnology and Genetic Engineering, Faisalabad 38000, Pakistan.
Akhter N; Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan.
Ben-Hur A; Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA.
Minhas F; Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK.

Bioinformatics ; 38(Suppl_2): ii75-ii81, 2022 09 16.

Article em En | MEDLINE | ID: mdl-36124806

ABSTRACT

ABSTRACT

MOTIVATION Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance.

RESULTS:

We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION Code and supplementary material available at https//github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Assuntos

Enzima de Conversão de Angiotensina 2; Aprendizado de Máquina; Humanos; Ligantes; SARS-CoV-2

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Enzima de Conversão de Angiotensina 2 Tipo de estudo: Prognostic_studies / Risk_factors_studies / Systematic_reviews Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Paquistão

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google