Your browser doesn't support javascript.
loading
MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models.
Kurata, Hiroyuki; Harun-Or-Roshid, Md; Mehedi Hasan, Md; Tsukiyama, Sho; Maeda, Kazuhiro; Manavalan, Balachandran.
Afiliação
  • Kurata H; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan. Electronic address: kurata@bio.kyutech.ac.jp.
  • Harun-Or-Roshid M; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
  • Mehedi Hasan M; Division of Biotetecnology and Molecular Medicine, Department of Pathobiological Science, School of Veterinary Medicine, Lousiana State University, Baton Rouge, LA 70803, USA.
  • Tsukiyama S; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
  • Maeda K; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
  • Manavalan B; Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Republic of Korea. Electronic address: bala2022@skku.edu.
Methods ; 227: 37-47, 2024 Jul.
Article em En | MEDLINE | ID: mdl-38729455
ABSTRACT
RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer's disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.
Assuntos
Palavras-chave

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: RNA / 5-Metilcitosina / Aprendizado de Máquina Limite: Humans Idioma: En Revista: Methods Assunto da revista: BIOQUIMICA Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: RNA / 5-Metilcitosina / Aprendizado de Máquina Limite: Humans Idioma: En Revista: Methods Assunto da revista: BIOQUIMICA Ano de publicação: 2024 Tipo de documento: Article