PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework.

Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I

Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I.

Afiliação

Song J; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia; Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. Electronic address: Jiangning.Song@monash.edu
Li F; Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
Takemoto K; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.
Haffari G; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.
Akutsu T; Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.
Chou KC; Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia. Electronic addres
Webb GI; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia. Electronic address: Geoff.Webb@monash.edu.

J Theor Biol ; 443: 125-137, 2018 04 14.

Article em En | MEDLINE | ID: mdl-29408627

RESUMO

Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations.

Assuntos

Bases de Dados de Proteínas; Aprendizado de Máquina; Análise de Sequência de Proteína/métodos; Software; Catálise; Conformação Proteica

Palavras-chave

Bioinformatics; Enzyme catalytic residues; Functional annotation; Machine learning; Pattern recognition; Sequence analysis; Sequencestructurefunction relationship

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Análise de Sequência de Proteína / Bases de Dados de Proteínas / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Idioma: En Revista: J Theor Biol Ano de publicação: 2018 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google