VirRep: a hybrid language representation learning framework for identifying viruses from human gut metagenomes.
Genome Biol
; 25(1): 177, 2024 Jul 04.
Article
in En
| MEDLINE
| ID: mdl-38965579
ABSTRACT
Identifying viruses from metagenomes is a common step to explore the virus composition in the human gut. Here, we introduce VirRep, a hybrid language representation learning framework, for identifying viruses from human gut metagenomes. VirRep combines a context-aware encoder and an evolution-aware encoder to improve sequence representation by incorporating k-mer patterns and sequence homologies. Benchmarking on both simulated and real datasets with varying viral proportions demonstrates that VirRep outperforms state-of-the-art methods. When applied to fecal metagenomes from a colorectal cancer cohort, VirRep identifies 39 high-quality viral species associated with the disease, many of which cannot be detected by existing methods.
Key words
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Metagenome
/
Gastrointestinal Microbiome
Limits:
Humans
Language:
En
Journal:
Genome Biol
Year:
2024
Document type:
Article