Your browser doesn't support javascript.
loading
ProsmORF-pred: a machine learning-based method for the identification of small ORFs in prokaryotic genomes.
Khanduja, Akshay; Kumar, Manish; Mohanty, Debasisa.
Afiliación
  • Khanduja A; National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110067, India.
  • Kumar M; National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110067, India.
  • Mohanty D; National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110067, India.
Brief Bioinform ; 24(3)2023 05 19.
Article en En | MEDLINE | ID: mdl-36988160
ABSTRACT
Small open reading frames (smORFs) encoding proteins less than 100 amino acids (aa) are known to be important regulators of key cellular processes. However, their computational identification remains a challenge. Based on a comprehensive analysis of known prokaryotic small ORFs, we have developed the ProsmORF-pred resource which uses a machine learning (ML)-based method for prediction of smORFs in the prokaryotic genome sequences. ProsmORF-pred consists of two ML models, one for initiation site recognition in nucleic acid sequences upstream of putative start codons and the other uses translated amino acid sequences to decipher functional protein like sequences. The nucleotide sequence-based initiation site recognition model has been trained using longer ORFs (>100 aa) in the same genome while the ML model for identification of protein like sequences has been trained using annotated smORFs from Escherichia coli. Comprehensive benchmarking of ProsmORF-pred reveals that its performance is comparable to other state-of-the-art approaches on the annotated smORF set derived from 32 prokaryotic genomes. Its performance is distinctly superior to other tools like PRODIGAL and RANSEPS for prediction of newly identified smORFs which have a length range of 10-30 aa, where prediction of smORFs has been a major challenge. Apart from identification of smORFs in genomic sequences, ProsmORF-pred can also aid in functional annotation of the predicted smORFs based on sequence similarity and genomic neighbourhood similarity searches in ProsmORFDB, a well-curated database of known smORFs. ProsmORF-pred along with its backend database ProsmORFDB is available as a user-friendly web server (http//www.nii.ac.in/prosmorfpred.html).
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteínas / Genoma Tipo de estudio: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: India

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteínas / Genoma Tipo de estudio: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: India