MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.
BMC Bioinformatics
; 25(Suppl 1): 153, 2024 Apr 16.
Article
em En
| MEDLINE
| ID: mdl-38627615
ABSTRACT
BACKGROUND:
With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database.RESULTS:
We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2.CONCLUSION:
This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.Palavras-chave
Texto completo:
1
Bases de dados:
MEDLINE
Assunto principal:
Viverridae
/
Metagenômica
Limite:
Animals
Idioma:
En
Revista:
BMC Bioinformatics
/
BMC bioinformatics (Online)
Assunto da revista:
INFORMATICA MEDICA
Ano de publicação:
2024
Tipo de documento:
Article