Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions.
J Chem Inf Model
; 60(3): 1833-1843, 2020 03 23.
Article
em En
| MEDLINE
| ID: mdl-32053362
Unannotated gene sequences in databases are increasing due to sequencing advances. Therefore, computational methods to predict functions of unannotated genes are needed. Moreover, novel enzyme discovery for metabolic engineering applications further encourages annotation of sequences. Here, enzyme functions are predicted using two general approaches, each including several machine learning algorithms. First, Enzyme-models (E-models) predict Enzyme Commission (EC) numbers from amino acid sequence information. Second, Substrate-Enzyme models (SE-models) are built to predict substrates of enzymatic reactions together with EC numbers, and Substrate-Enzyme-Product models (SEP-models) are built to predict substrates, products, and EC numbers. While accuracy of E-models is not optimal, SE-models and SEP-models predict EC numbers and reactions with high accuracy using all tested machine learning-based methods. For example, a single Random Forests-based SEP-model predicts EC first digits with an Average AUC score of over 0.94. Various metrics indicate that the current strategy of combining sequence and chemical structure information is effective at improving enzyme reaction prediction.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Biologia Computacional
/
Aprendizado de Máquina
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
J Chem Inf Model
Assunto da revista:
INFORMATICA MEDICA
/
QUIMICA
Ano de publicação:
2020
Tipo de documento:
Article