Your browser doesn't support javascript.
loading
D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery.
Perdikopanis, Nikos; Giannakakis, Antonis; Kavakiotis, Ioannis; Hatzigeorgiou, Artemis G.
Affiliation
  • Perdikopanis N; Department of Electrical and Computer Engineering, University of Thessaly, 38221 Volos, Greece.
  • Giannakakis A; Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece.
  • Kavakiotis I; Department of Computer Science and Biomedical Informatics, University of Thessaly, 38221 Volos, Greece.
  • Hatzigeorgiou AG; Department of Molecular Biology and Genetics, Democritus University of Thrace, 68100 Alexandroupolis, Greece.
Biology (Basel) ; 13(8)2024 Jul 26.
Article in En | MEDLINE | ID: mdl-39194501
ABSTRACT
Small open reading frames (sORFs; <300 nucleotides or <100 amino acids) are widespread across all genomes, and an increasing variety of them appear to be translating from non-genic regions. Over the past few decades, peptides produced from sORFs have been identified as functional in various organisms, from bacteria to humans. Despite recent advances in next-generation sequencing and proteomics, accurate annotation and classification of sORFs remain a rate-limiting step toward reliable and high-throughput detection of small proteins from non-genic regions. Additionally, the cost of computational methods utilizing machine learning is lower than that of biological experiments, and they can be employed to detect sORFs, laying the groundwork for biological experiments. We present D-sORF, a machine-learning framework that integrates the statistical nucleotide context and motif information around the start codon to predict coding sORFs. D-sORF scores directly for coding identity and requires only the underlying genomic sequence, without incorporating parameters such as the conservation, which, in the case of sORFs, may increase the dispersion of scores within the significantly less conserved non-genic regions. D-sORF achieves 94.74% precision and 92.37% accuracy for small ORFs (using the 99 nt medium length window). When D-sORF is applied to sORFs associated with ribosomes, the identification of transcripts producing peptides (annotated by the Ensembl IDs) is similar to or superior to experimental methodologies based on ribosome-sequencing (Ribo-Seq) profiling. In parallel, the recognition of putative negative data, such as the intron-containing transcripts that associate with ribosomes, remains remarkably low, indicating that D-sORF could be efficiently applied to filter out false-positive sORFs from Ribo-Seq data because of the non-productive ribosomal binding or noise inherent in these protocols.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Biology (Basel) Year: 2024 Document type: Article Affiliation country: Greece Country of publication: Switzerland

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Biology (Basel) Year: 2024 Document type: Article Affiliation country: Greece Country of publication: Switzerland