Your browser doesn't support javascript.
loading
MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models.
Kurata, Hiroyuki; Harun-Or-Roshid, Md; Mehedi Hasan, Md; Tsukiyama, Sho; Maeda, Kazuhiro; Manavalan, Balachandran.
Affiliation
  • Kurata H; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan. Electronic address: kurata@bio.kyutech.ac.jp.
  • Harun-Or-Roshid M; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
  • Mehedi Hasan M; Division of Biotetecnology and Molecular Medicine, Department of Pathobiological Science, School of Veterinary Medicine, Lousiana State University, Baton Rouge, LA 70803, USA.
  • Tsukiyama S; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
  • Maeda K; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
  • Manavalan B; Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Republic of Korea. Electronic address: bala2022@skku.edu.
Methods ; 227: 37-47, 2024 Jul.
Article in En | MEDLINE | ID: mdl-38729455
ABSTRACT
RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer's disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: RNA / 5-Methylcytosine / Machine Learning Limits: Humans Language: En Journal: Methods Journal subject: BIOQUIMICA Year: 2024 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: RNA / 5-Methylcytosine / Machine Learning Limits: Humans Language: En Journal: Methods Journal subject: BIOQUIMICA Year: 2024 Document type: Article