Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions.

Watanabe, Naoki; Murata, Masahiro; Ogawa, Teppei; Vavricka, Christopher J; Kondo, Akihiko; Ogino, Chiaki; Araki, Michihiro

Watanabe, Naoki; Murata, Masahiro; Ogawa, Teppei; Vavricka, Christopher J; Kondo, Akihiko; Ogino, Chiaki; Araki, Michihiro.

Afiliação

Watanabe N; Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan.
Murata M; Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan.
Ogawa T; Mitsui Knowledge Industry Co., Ltd. (MKI), 2-3-33 Nakanoshima, Kita-ku, Osaka 530-0005, Japan.
Vavricka CJ; Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan.
Kondo A; Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan.
Ogino C; Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan.
Araki M; Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan.

J Chem Inf Model ; 60(3): 1833-1843, 2020 03 23.

Article em En | MEDLINE | ID: mdl-32053362

RESUMO

Unannotated gene sequences in databases are increasing due to sequencing advances. Therefore, computational methods to predict functions of unannotated genes are needed. Moreover, novel enzyme discovery for metabolic engineering applications further encourages annotation of sequences. Here, enzyme functions are predicted using two general approaches, each including several machine learning algorithms. First, Enzyme-models (E-models) predict Enzyme Commission (EC) numbers from amino acid sequence information. Second, Substrate-Enzyme models (SE-models) are built to predict substrates of enzymatic reactions together with EC numbers, and Substrate-Enzyme-Product models (SEP-models) are built to predict substrates, products, and EC numbers. While accuracy of E-models is not optimal, SE-models and SEP-models predict EC numbers and reactions with high accuracy using all tested machine learning-based methods. For example, a single Random Forests-based SEP-model predicts EC first digits with an Average AUC score of over 0.94. Various metrics indicate that the current strategy of combining sequence and chemical structure information is effective at improving enzyme reaction prediction.

Assuntos

Biologia Computacional; Aprendizado de Máquina; Algoritmos; Sequência de Aminoácidos; Bases de Dados Factuais

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Biologia Computacional / Aprendizado de Máquina Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google