Your browser doesn't support javascript.
loading
MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.
Peres da Silva, Rafael; Suphavilai, Chayaporn; Nagarajan, Niranjan.
Affiliation
  • Peres da Silva R; School of Computing, National University of Singapore, Singapore, 117417, Republic of Singapore. rperesdasilva@gis.a-star.edu.sg.
  • Suphavilai C; Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore. rperesdasilva@gis.a-star.edu.sg.
  • Nagarajan N; Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore.
BMC Bioinformatics ; 25(Suppl 1): 153, 2024 Apr 16.
Article de En | MEDLINE | ID: mdl-38627615
ABSTRACT

BACKGROUND:

With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database.

RESULTS:

We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2.

CONCLUSION:

This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.
Sujet(s)
Mots clés

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Viverridae / Métagénomique Limites: Animals Langue: En Journal: BMC Bioinformatics Sujet du journal: INFORMATICA MEDICA Année: 2024 Type de document: Article

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Viverridae / Métagénomique Limites: Animals Langue: En Journal: BMC Bioinformatics Sujet du journal: INFORMATICA MEDICA Année: 2024 Type de document: Article
...