Your browser doesn't support javascript.
loading
Prediction of protein-coding small ORFs in multi-species using integrated sequence-derived features and the random forest model.
Yu, Jiafeng; Jiang, Wenwen; Zhu, Sen-Bin; Liao, Zhen; Dou, Xianghua; Liu, Jian; Guo, Feng-Biao; Dong, Chuan.
Affiliation
  • Yu J; Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.
  • Jiang W; Department of Bioinformatics, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
  • Zhu SB; School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
  • Liao Z; School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
  • Dou X; Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.
  • Liu J; Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.
  • Guo FB; School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China. Electronic address: fbguoy@whu.edu.cn.
  • Dong C; School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China. Electronic address: chuand@whu.edu.cn.
Methods ; 210: 10-19, 2023 02.
Article in En | MEDLINE | ID: mdl-36621557
ABSTRACT
Proteins encoded by small open reading frames (sORFs) can serve as functional elements playing important roles in vivo. Such sORFs also constitute the potential pool for facilitating the de novo gene birth, driving evolutionary innovation and species diversity. Therefore, their theoretical and experimental identification has become a critical issue. Herein, we proposed a protein-coding sORFs prediction method merely based on integrative sequence-derived features. Our prediction performance is better or comparable compared with other nine prevalent methods, which shows that our method can provide a relatively reliable research tool for the prediction of protein-coding sORFs. Our method allows users to estimate the potential expression of a queried sORF, which has been demonstrated by the correlation analysis between our possibility estimation and codon adaption index (CAI). Based on the features that we used, we demonstrated that the sequence features of the protein-coding sORFs in the two domains have significant differences implying that it might be a relatively hard task in terms of cross-domain prediction, hence domain-specific models were developed, which allowed users to predict protein-coding sORFs both in eukaryotes and prokaryotes. Finally, a web-server was developed and provided to boost and facilitate the study of the related field, which is freely available at http//guolab.whu.edu.cn/codingCapacity/index.html.
Subject(s)
Key words

Full text: 1 Database: MEDLINE Main subject: Random Forest Type of study: Clinical_trials / Prognostic_studies / Risk_factors_studies Language: En Journal: Methods Journal subject: BIOQUIMICA Year: 2023 Type: Article Affiliation country: China

Full text: 1 Database: MEDLINE Main subject: Random Forest Type of study: Clinical_trials / Prognostic_studies / Risk_factors_studies Language: En Journal: Methods Journal subject: BIOQUIMICA Year: 2023 Type: Article Affiliation country: China