Your browser doesn't support javascript.
loading
miWords: transformer-based composite deep learning for highly accurate discovery of pre-miRNA regions across plant genomes.
Gupta, Sagar; Shankar, Ravi.
Afiliación
  • Gupta S; Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India.
  • Shankar R; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India.
Brief Bioinform ; 24(2)2023 03 19.
Article en En | MEDLINE | ID: mdl-36907657
ABSTRACT
Discovering pre-microRNAs (miRNAs) is the core of miRNA discovery. Using traditional sequence/structural features, many tools have been published to discover miRNAs. However, in practical applications like genomic annotations, their actual performance has been very low. This becomes more grave in plants where unlike animals pre-miRNAs are much more complex and difficult to identify. A huge gap exists between animals and plants for the available software for miRNA discovery and species-specific miRNA information. Here, we present miWords, a composite deep learning system of transformers and convolutional neural networks which sees genome as a pool of sentences made of words with specific occurrence preferences and contexts, to accurately identify pre-miRNA regions across plant genomes. A comprehensive benchmarking was done involving >10 software representing different genre and many experimentally validated datasets. miWords emerged as the best one while breaching accuracy of 98% and performance lead of ~10%. miWords was also evaluated across Arabidopsis genome where also it outperformed the compared tools. As a demonstration, miWords was run across the tea genome, reporting 803 pre-miRNA regions, all validated by small RNA-seq reads from multiple samples, and most of them were functionally supported by the degradome sequencing data. miWords is freely available as stand-alone source codes at https//scbb.ihbt.res.in/miWords/index.php.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Arabidopsis / MicroARNs / Aprendizaje Profundo Límite: Animals Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: India

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Arabidopsis / MicroARNs / Aprendizaje Profundo Límite: Animals Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: India