Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties.

Zhu, Huan; Ao, Chun-Yan; Ding, Yi-Jie; Hao, Hong-Xia; Yu, Liang

Zhu, Huan; Ao, Chun-Yan; Ding, Yi-Jie; Hao, Hong-Xia; Yu, Liang.

Afiliação

Zhu H; School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
Ao CY; School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
Ding YJ; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
Hao HX; School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
Yu L; School of Computer Science and Technology, Xidian University, Xi'an 710071, China.

Int J Mol Sci ; 23(6)2022 Mar 11.

Article em En | MEDLINE | ID: mdl-35328461

RESUMO

Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.

Assuntos

Algoritmos; Nucleotídeos; Biologia Computacional/métodos; RNA de Transferência; Reprodutibilidade dos Testes

Palavras-chave

dihydrouridine; nucleotide chemical properties; oversample; prediction; random forest

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Nucleotídeos Tipo de estudo: Clinical_trials / Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google