RESUMO
For taxonomy based classification of metagenomics assembled contigs, current methods use sequence similarity to identify their most likely taxonomy. However, in the related field of metagenomic binning, contigs are routinely clustered using information from both the contig sequences and their abundance. We introduce Taxometer, a neural network based method that improves the annotations and estimates the quality of any taxonomic classifier using contig abundance profiles and tetra-nucleotide frequencies. We apply Taxometer to five short-read CAMI2 datasets and find that it increases the average share of correct species-level contig annotations of the MMSeqs2 tool from 66.6% to 86.2%. Additionally, it reduce the share of wrong species-level annotations in the CAMI2 Rhizosphere dataset by an average of two-fold for Metabuli, Centrifuge, and Kraken2. Futhermore, we use Taxometer for benchmarking taxonomic classifiers on two complex long-read metagenomics data sets where ground truth is not known. Taxometer is available as open-source software and can enhance any taxonomic annotation of metagenomic contigs.
Assuntos
Metagenômica , Software , Metagenômica/métodos , Redes Neurais de Computação , Classificação/métodos , Metagenoma/genética , Algoritmos , Mapeamento de Sequências Contíguas/métodos , RizosferaRESUMO
Assembly of reads from metagenomic samples is a hard problem, often resulting in highly fragmented genome assemblies. Metagenomic binning allows us to reconstruct genomes by re-grouping the sequences by their organism of origin, thus representing a crucial processing step when exploring the biological diversity of metagenomic samples. Here we present Adversarial Autoencoders for Metagenomics Binning (AAMB), an ensemble deep learning approach that integrates sequence co-abundances and tetranucleotide frequencies into a common denoised space that enables precise clustering of sequences into microbial genomes. When benchmarked, AAMB presented similar or better results compared with the state-of-the-art reference-free binner VAMB, reconstructing ~7% more near-complete (NC) genomes across simulated and real data. In addition, genomes reconstructed using AAMB had higher completeness and greater taxonomic diversity compared with VAMB. Finally, we implemented a pipeline Integrating VAMB and AAMB that enabled improved binning, recovering 20% and 29% more simulated and real NC genomes, respectively, compared to VAMB, with moderate additional runtime.
Assuntos
Genoma Microbiano , Metagenoma , Metagenômica/métodos , Análise por Conglomerados , BenchmarkingRESUMO
The application of multiple omics technologies in biomedical cohorts has the potential to reveal patient-level disease characteristics and individualized response to treatment. However, the scale and heterogeneous nature of multi-modal data makes integration and inference a non-trivial task. We developed a deep-learning-based framework, multi-omics variational autoencoders (MOVE), to integrate such data and applied it to a cohort of 789 people with newly diagnosed type 2 diabetes with deep multi-omics phenotyping from the DIRECT consortium. Using in silico perturbations, we identified drug-omics associations across the multi-modal datasets for the 20 most prevalent drugs given to people with type 2 diabetes with substantially higher sensitivity than univariate statistical tests. From these, we among others, identified novel associations between metformin and the gut microbiota as well as opposite molecular responses for the two statins, simvastatin and atorvastatin. We used the associations to quantify drug-drug similarities, assess the degree of polypharmacy and conclude that drug effects are distributed across the multi-omics modalities.
Assuntos
Aprendizado Profundo , Diabetes Mellitus Tipo 2 , Humanos , Algoritmos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/genéticaRESUMO
Despite the accelerating number of uncultivated virus sequences discovered in metagenomics and their apparent importance for health and disease, the human gut virome and its interactions with bacteria in the gastrointestinal tract are not well understood. This is partly due to a paucity of whole-virome datasets and limitations in current approaches for identifying viral sequences in metagenomics data. Here, combining a deep-learning based metagenomics binning algorithm with paired metagenome and metavirome datasets, we develop Phages from Metagenomics Binning (PHAMB), an approach that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations. When applied on the Human Microbiome Project 2 (HMP2) dataset, PHAMB recovered 6,077 high-quality genomes from 1,024 viral populations, and identified viral-microbial host interactions. PHAMB can be advantageously applied to existing and future metagenomes to illuminate viral ecological dynamics with other microbiome constituents.
Assuntos
Bacteriófagos/classificação , Microbioma Gastrointestinal/genética , Trato Gastrointestinal/virologia , Metagenoma/genética , Viroma/genética , Bacteriófagos/genética , Microbioma Gastrointestinal/fisiologia , Genoma Viral/genética , Humanos , Metagenômica , Viroma/fisiologiaRESUMO
Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k-mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29-98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb .
Assuntos
Genoma Bacteriano/genética , Metagenoma/genética , Anotação de Sequência Molecular , Software , Bacteroides/genética , Humanos , Metagenômica , Microbiota/genéticaRESUMO
One Health surveillance of antimicrobial resistance (AMR) depends on a harmonized method for detection of AMR. Metagenomics-based surveillance offers the possibility to compare resistomes within and between different target populations. Its potential to be embedded into policy in the future calls for a timely and integrated knowledge dissemination strategy. We developed a blended training (e-learning and a workshop) on the use of metagenomics in surveillance of pathogens and AMR. The objectives were to highlight the potential of metagenomics in the context of integrated surveillance, to demonstrate its applicability through hands-on training and to raise awareness to bias factors. The target participants included staff of competent authorities responsible for AMR monitoring and academic staff. The training was organized in modules covering the workflow, requirements, benefits and challenges of surveillance by metagenomics. The training had 41 participants. The face-to-face workshop was essential to understand the expectations of the participants about the transition to metagenomics-based surveillance. After revision of the e-learning, we released it as a Massive Open Online Course (MOOC), now available at https://www.coursera.org/learn/metagenomics. This course has run in more than 20 sessions, with more than 3,000 learners enrolled, from more than 120 countries. Blended learning and MOOCs are useful tools to deliver knowledge globally and across disciplines. The released MOOC can be a reference knowledge source for international players in the application of metagenomics in surveillance.