RESUMEN
We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog of the urban microbial ecosystem. This atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, and 838,532 CRISPR arrays not found in reference databases. We identified 4,246 known species of urban microorganisms and a consistent set of 31 species found in 97% of samples that were distinct from human commensal organisms. Profiles of AMR genes varied widely in type and density across cities. Cities showed distinct microbial taxonomic signatures that were driven by climate and geographic differences. These results constitute a high-resolution global metagenomic atlas that enables discovery of organisms and genes, highlights potential public health and forensic applications, and provides a culture-independent view of AMR burden in cities.
Asunto(s)
Farmacorresistencia Bacteriana/genética , Metagenómica , Microbiota/genética , Población Urbana , Biodiversidad , Bases de Datos Genéticas , HumanosRESUMEN
Among arthropod vectors, ticks transmit the most diverse human and animal pathogens, leading to an increasing number of new challenges worldwide. Here we sequenced and assembled high-quality genomes of six ixodid tick species and further resequenced 678 tick specimens to understand three key aspects of ticks: genetic diversity, population structure, and pathogen distribution. We explored the genetic basis common to ticks, including heme and hemoglobin digestion, iron metabolism, and reactive oxygen species, and unveiled for the first time that genetic structure and pathogen composition in different tick species are mainly shaped by ecological and geographic factors. We further identified species-specific determinants associated with different host ranges, life cycles, and distributions. The findings of this study are an invaluable resource for research and control of ticks and tick-borne diseases.
Asunto(s)
Variación Genética/genética , Enfermedades por Picaduras de Garrapatas/microbiología , Garrapatas/genética , Animales , Línea Celular , Vectores de Enfermedades , Especificidad del Huésped/genéticaRESUMEN
Ocean microbial communities strongly influence the biogeochemistry, food webs, and climate of our planet. Despite recent advances in understanding their taxonomic and genomic compositions, little is known about how their transcriptomes vary globally. Here, we present a dataset of 187 metatranscriptomes and 370 metagenomes from 126 globally distributed sampling stations and establish a resource of 47 million genes to study community-level transcriptomes across depth layers from pole-to-pole. We examine gene expression changes and community turnover as the underlying mechanisms shaping community transcriptomes along these axes of environmental variation and show how their individual contributions differ for multiple biogeochemically relevant processes. Furthermore, we find the relative contribution of gene expression changes to be significantly lower in polar than in non-polar waters and hypothesize that in polar regions, alterations in community activity in response to ocean warming will be driven more strongly by changes in organismal composition than by gene regulatory mechanisms. VIDEO ABSTRACT.
Asunto(s)
Regulación de la Expresión Génica , Metagenoma , Océanos y Mares , Transcriptoma/genética , Geografía , Microbiota/genética , Anotación de Secuencia Molecular , ARN Mensajero/genética , ARN Mensajero/metabolismo , Agua de Mar/microbiología , TemperaturaRESUMEN
The intestinal microbiota undergoes diurnal compositional and functional oscillations that affect metabolic homeostasis, but the mechanisms by which the rhythmic microbiota influences host circadian activity remain elusive. Using integrated multi-omics and imaging approaches, we demonstrate that the gut microbiota features oscillating biogeographical localization and metabolome patterns that determine the rhythmic exposure of the intestinal epithelium to different bacterial species and their metabolites over the course of a day. This diurnal microbial behavior drives, in turn, the global programming of the host circadian transcriptional, epigenetic, and metabolite oscillations. Surprisingly, disruption of homeostatic microbiome rhythmicity not only abrogates normal chromatin and transcriptional oscillations of the host, but also incites genome-wide de novo oscillations in both intestine and liver, thereby impacting diurnal fluctuations of host physiology and disease susceptibility. As such, the rhythmic biogeography and metabolome of the intestinal microbiota regulates the temporal organization and functional outcome of host transcriptional and epigenetic programs.
Asunto(s)
Ritmo Circadiano , Colon/microbiología , Microbioma Gastrointestinal , Transcriptoma , Animales , Cromatina/metabolismo , Colon/metabolismo , Vida Libre de Gérmenes , Hígado/metabolismo , Ratones , Microscopía Electrónica de RastreoRESUMEN
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic approach. In this study, we developed a new bioinformatics tool, coverage-based analysis for identification of microbiome (CAIM), for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count-based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consistently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similarity of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and 44 primary liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.
Asunto(s)
Metagenómica , Microbiota , Humanos , Microbiota/genética , Metagenómica/métodos , Biología Computacional/métodos , Metagenoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Análisis de Secuencia de ADN/métodosRESUMEN
Beneficial bacteria remain largely unexplored. Lacking systematic methods, understanding probiotic community traits becomes challenging, leading to various conclusions about their probiotic effects among different publications. We developed language model-based metaProbiotics to rapidly detect probiotic bins from metagenomes, demonstrating superior performance in simulated benchmark datasets. Testing on gut metagenomes from probiotic-treated individuals, it revealed the probioticity of intervention strains-derived bins and other probiotic-associated bins beyond the training data, such as a plasmid-like bin. Analyses of these bins revealed various probiotic mechanisms and bai operon as probiotic Ruminococcaceae's potential marker. In different health-disease cohorts, these bins were more common in healthy individuals, signifying their probiotic role, but relevant health predictions based on the abundance profiles of these bins faced cross-disease challenges. To better understand the heterogeneous nature of probiotics, we used metaProbiotics to construct a comprehensive probiotic genome set from global gut metagenomic data. Module analysis of this set shows that diseased individuals often lack certain probiotic gene modules, with significant variation of the missing modules across different diseases. Additionally, different gene modules on the same probiotic have heterogeneous effects on various diseases. We thus believe that gene function integrity of the probiotic community is more crucial in maintaining gut homeostasis than merely increasing specific gene abundance, and adding probiotics indiscriminately might not boost health. We expect that the innovative language model-based metaProbiotics tool will promote novel probiotic discovery using large-scale metagenomic data and facilitate systematic research on bacterial probiotic effects. The metaProbiotics program can be freely downloaded at https://github.com/zhenchengfang/metaProbiotics.
Asunto(s)
Metagenoma , Probióticos , Humanos , Algoritmos , Metagenómica/métodos , Bacterias/genética , LenguajeRESUMEN
Bacteriophages are the viruses that infect bacterial cells. They are the most diverse biological entities on earth and play important roles in microbiome. According to the phage lifestyle, phages can be divided into the virulent phages and the temperate phages. Classifying virulent and temperate phages is crucial for further understanding of the phage-host interactions. Although there are several methods designed for phage lifestyle classification, they merely either consider sequence features or gene features, leading to low accuracy. A new computational method, DeePhafier, is proposed to improve classification performance on phage lifestyle. Built by several multilayer self-attention neural networks, a global self-attention neural network, and being combined by protein features of the Position Specific Scoring Matrix matrix, DeePhafier improves the classification accuracy and outperforms two benchmark methods. The accuracy of DeePhafier on five-fold cross-validation is as high as 87.54% for sequences with length >2000bp.
Asunto(s)
Bacteriófagos , Redes Neurales de la Computación , Bacteriófagos/genética , Biología Computacional/métodos , Proteínas Virales/genética , Proteínas Virales/metabolismo , AlgoritmosRESUMEN
Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.
Asunto(s)
Metagenoma , Microbiota , Microbiota/genética , BenchmarkingRESUMEN
High-throughput profiling of microbial functional traits involved in various biogeochemical cycling pathways using shotgun metagenomic sequencing has been routinely applied in microbial ecology and environmental science. Multiple bioinformatics data processing approaches are available, including assembly-based (single-sample assembly and multi-sample assembly) and read-based (merged reads and raw data). However, it remains not clear how these different approaches may differ in data analyses and affect result interpretation. In this study, using two typical shotgun metagenome datasets recovered from geographically distant coastal sediments, the performance of different data processing approaches was comparatively investigated from both technical and biological/ecological perspectives. Microbially mediated biogeochemical cycling pathways, including nitrogen cycling, sulfur cycling and B12 biosynthesis, were analyzed. As a result, multi-sample assembly provided the most amount of usable information for targeted functional traits, at a high cost of computational resources and running time. Single-sample assembly and read-based analysis were comparable in obtaining usable information, but the former was much more time- and resource-consuming. Critically, different approaches introduced much stronger variations in microbial profiles than biological differences. However, community-level differences between the two sampling sites could be consistently observed despite the approaches being used. In choosing an appropriate approach, researchers shall balance the trade-offs between multiple factors, including the scientific question, the amount of usable information, computational resources and time cost. This study is expected to provide valuable technical insights and guidelines for the various approaches used for metagenomic data analysis.
Asunto(s)
Metagenoma , Metagenómica , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
Recovering high-quality metagenome-assembled genomes (HQ-MAGs) is critical for exploring microbial compositions and microbe-phenotype associations. However, multiple sequencing platforms and computational tools for this purpose may confuse researchers and thus call for extensive evaluation. Here, we systematically evaluated a total of 40 combinations of popular computational tools and sequencing platforms (i.e. strategies), involving eight assemblers, eight metagenomic binners and four sequencing technologies, including short-, long-read and metaHiC sequencing. We identified the best tools for the individual tasks (e.g. the assembly and binning) and combinations (e.g. generating more HQ-MAGs) depending on the availability of the sequencing data. We found that the combination of the hybrid assemblies and metaHiC-based binning performed best, followed by the hybrid and long-read assemblies. More importantly, both long-read and metaHiC sequencings link more mobile elements and antibiotic resistance genes to bacterial hosts and improve the quality of public human gut reference genomes with 32% (34/105) HQ-MAGs that were either of better quality than those in the Unified Human Gastrointestinal Genome catalog version 2 or novel.
Asunto(s)
Metagenoma , Metagenómica , Humanos , Análisis de Secuencia de ADN , Bacterias/genética , Tracto GastrointestinalRESUMEN
Protein function prediction based on amino acid sequence alone is an extremely challenging but important task, especially in metagenomics/metatranscriptomics field, in which novel proteins have been uncovered exponentially from new microorganisms. Many of them are extremely low homology to known proteins and cannot be annotated with homology-based or information integrative methods. To overcome this problem, we proposed a Homology Independent protein Function annotation method (HiFun) based on a unified deep-learning model by reassembling the sequence as protein language. The robustness of HiFun was evaluated using the benchmark datasets and metrics in the CAFA3 challenge. To navigate the utility of HiFun, we annotated 2 212 663 unknown proteins and discovered novel motifs in the UHGP-50 catalog. We proved that HiFun can extract latent function related structure features which empowers it ability to achieve function annotation for non-homology proteins. HiFun can substantially improve newly proteins annotation and expand our understanding of microorganisms' adaptation in various ecological niches. Moreover, we provided a free and accessible webservice at http://www.unimd.org/HiFun, requiring only protein sequences as input, offering researchers an efficient and practical platform for predicting protein functions.
Asunto(s)
Benchmarking , Lenguaje , Secuencia de Aminoácidos , Metagenómica , Anotación de Secuencia MolecularRESUMEN
Factor analysis, ranging from principal component analysis to nonnegative matrix factorization, represents a foremost approach in analyzing multi-dimensional data to extract valuable patterns, and is increasingly being applied in the context of multi-dimensional omics datasets represented in tensor form. However, traditional analytical methods are heavily dependent on the format and structure of the data itself, and if these change even slightly, the analyst must change their data analysis strategy and techniques and spend a considerable amount of time on data preprocessing. Additionally, many traditional methods cannot be applied as-is in the presence of missing values in the data. We present a new statistical framework, unified nonnegative matrix factorization (UNMF), for finding informative patterns in messy biological data sets. UNMF is designed for tidy data format and structure, making data analysis easier and simplifying the development of data analysis tools. UNMF can handle a wide range of data structures and formats, and works seamlessly with tensor data including missing observations and repeated measurements. The usefulness of UNMF is demonstrated through its application to several multi-dimensional omics data, offering user-friendly and unified features for analysis and integration. Its application holds great potential for the life science community. UNMF is implemented with R and is available from GitHub (https://github.com/abikoushi/moltenNMF).
Asunto(s)
Algoritmos , Multiómica , Análisis de Componente Principal , Análisis FactorialRESUMEN
Microbial genome recovery from metagenomes can further explain microbial ecosystem structures, functions and dynamics. Thus, this study developed the Additional Clustering Refiner (ACR) to enhance high-purity prokaryotic and eukaryotic metagenome-assembled genome (MAGs) recovery. ACR refines low-quality MAGs by subjecting them to iterative k-means clustering predicated on contig abundance and increasing bin purity through validated universal marker genes. Synthetic and real-world metagenomic datasets, including short- and long-read sequences, evaluated ACR's effectiveness. The results demonstrated improved MAG purity and a significant increase in high- and medium-quality MAG recovery rates. In addition, ACR seamlessly integrates with various binning algorithms, augmenting their strengths without modifying core features. Furthermore, its multiple sequencing technology compatibilities expand its applicability. By efficiently recovering high-quality prokaryotic and eukaryotic genomes, ACR is a promising tool for deepening our understanding of microbial communities through genome-centric metagenomics.
Asunto(s)
Metagenoma , Microbiota , Eucariontes/genética , Microbiota/genética , Algoritmos , Metagenómica/métodos , Análisis por ConglomeradosRESUMEN
Metagenome assembly is an efficient approach to reconstruct microbial genomes from metagenomic sequencing data. Although short-read sequencing has been widely used for metagenome assembly, linked- and long-read sequencing have shown their advancements in assembly by providing long-range DNA connectedness. Many metagenome assembly tools were developed to simplify the assembly graphs and resolve the repeats in microbial genomes. However, there remains no comprehensive evaluation of metagenomic sequencing technologies, and there is a lack of practical guidance on selecting the appropriate metagenome assembly tools. This paper presents a comprehensive benchmark of 19 commonly used assembly tools applied to metagenomic sequencing datasets obtained from simulation, mock communities or human gut microbiomes. These datasets were generated using mainstream sequencing platforms, such as Illumina and BGISEQ short-read sequencing, 10x Genomics linked-read sequencing, and PacBio and Oxford Nanopore long-read sequencing. The assembly tools were extensively evaluated against many criteria, which revealed that long-read assemblers generated high contig contiguity but failed to reveal some medium- and high-quality metagenome-assembled genomes (MAGs). Linked-read assemblers obtained the highest number of overall near-complete MAGs from the human gut microbiomes. Hybrid assemblers using both short- and long-read sequencing were promising methods to improve both total assembly length and the number of near-complete MAGs. This paper also discussed the running time and peak memory consumption of these assembly tools and provided practical guidance on selecting them.
Asunto(s)
Metagenoma , Microbiota , Humanos , Benchmarking , Microbiota/genética , Metagenómica/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodosRESUMEN
BACKGROUND: Dyslipidemia is treated effectively with statins, but treatment has the potential to induce new-onset type-2 diabetes. Gut microbiota may contribute to this outcome variability. We assessed the associations of gut microbiota diversity and composition with statins. Bacterial associations with statin-associated new-onset type-2 diabetes (T2D) risk were also prospectively evaluated. METHODS: We examined shallow-shotgun-sequenced fecal samples from 5755 individuals in the FINRISK-2002 population cohort with a 17+-year-long register-based follow-up. Alpha-diversity was quantified using Shannon index and beta-diversity with Aitchison distance. Species-specific differential abundances were analyzed using general multivariate regression. Prospective associations were assessed with Cox regression. Applicable results were validated using gradient boosting. RESULTS: Statin use associated with differing taxonomic composition (R2, 0.02%; q=0.02) and 13 differentially abundant species in fully adjusted models (MaAsLin; q<0.05). The strongest positive association was with Clostridium sartagoforme (ß=0.37; SE=0.13; q=0.02) and the strongest negative association with Bacteroides cellulosilyticus (ß=-0.31; SE=0.11; q=0.02). Twenty-five microbial features had significant associations with incident T2D in statin users, of which only Bacteroides vulgatus (HR, 1.286 [1.136-1.457]; q=0.03) was consistent regardless of model adjustment. Finally, higher statin-associated T2D risk was seen with [Ruminococcus] torques (ΔHRstatins, +0.11; q=0.03), Blautia obeum (ΔHRstatins, +0.06; q=0.01), Blautia sp. KLE 1732 (ΔHRstatins, +0.05; q=0.01), and beta-diversity principal component 1 (ΔHRstatin, +0.07; q=0.03) but only when adjusting for demographic covariates. CONCLUSIONS: Statin users have compositionally differing microbiotas from nonusers. The human gut microbiota is associated with incident T2D risk in statin users and possibly has additive effects on statin-associated new-onset T2D risk.
Asunto(s)
Diabetes Mellitus Tipo 2 , Dislipidemias , Microbioma Gastrointestinal , Inhibidores de Hidroximetilglutaril-CoA Reductasas , Humanos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/efectos adversos , Estudios Transversales , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/epidemiología , Dislipidemias/diagnóstico , Dislipidemias/tratamiento farmacológico , Dislipidemias/epidemiologíaRESUMEN
BACKGROUND: The rumen microbiome plays an essential role in maintaining ruminants' growth and performance even under extreme environmental conditions, however, which factors influence rumen microbiome stability when ruminants are reared in such habitats throughout the year is unclear. Hence, the rumen microbiome of yak (less domesticated) and cattle (domesticated) reared on the Qinghai-Tibetan Plateau through the year were assessed to evaluate temporal changes in their composition, function, and stability. RESULTS: Rumen fermentation characteristics and pH significantly shifted across seasons in both cattle and yak, but the patterns differed between the two ruminant species. Ruminal enzyme activity varied with season, and production of xylanase and cellulase was greater in yak compared to cattle in both fall and winter. The rumen bacterial community varied with season in both yak and cattle, with higher alpha diversity and similarity (beta diversity) in yak than cattle. The diversity indices of eukaryotic community did not change with season in both ruminant species, but higher similarity was observed in yak. In addition, the similarity of rumen microbiome functional community was higher in yak than cattle across seasons. Moreover, yak rumen microbiome encoded more genes (GH2 and GH3) related to cellulose and hemicellulose degradation compared to cattle, and a new enzyme family (GH160) gene involved in oligosaccharides was uniquely detected in yak rumen. The season affected microbiome attenuation and buffering values (stability), with higher buffering value in yak rumen microbiome than cattle. Positive correlations between antimicrobial resistance gene (dfrF) and CAZyme family (GH113) and microbiome stability were identified in yak, but such relationship was negatively correlated in cattle. CONCLUSIONS: The findings of the potential of cellulose degradation, the relationship between rumen microbial stability and the abundance of functional genes varied differently across seasons and between yak and cattle provide insight into the mechanisms that may underpin their divergent adaptation patterns to the harsh climate of the Qinghai-Tibetan Plateau. These results lay a solid foundation for developing strategies to maintain and improve rumen microbiome stability and dig out the potential candidates for manufacturing lignocellulolytic enzymes in the yak rumen to enhance ruminants' performance under extreme environmental conditions.
Asunto(s)
Microbioma Gastrointestinal , Rumen , Estaciones del Año , Animales , Bovinos , Rumen/microbiología , Microbioma Gastrointestinal/fisiología , Microbiota , Adaptación Fisiológica , Bacterias/genética , Bacterias/clasificaciónRESUMEN
Ningxiang (NX) pig has been recognized as one of the most famous Chinese indigenous breeds due to its characteristics in stress resistance. However, intestinal microbial feature and gene profiling in NX piglets have not been studied. Here, we compared the intestinal microbiome and transcriptome between NX and Duroc × Landrace × Large white (DLY) piglets and found the high enrichment of several colonic Bacteroides, Prevotella and Clostridium species in NX piglets. Further functional analyses revealed their predominant function in methane, glycolysis and gluconeogenesis metabolism. Our mRNA-sequencing data unraveled the distinct colonic gene expression between these two breeds. In particular, we showed that the improved intestinal function in NX piglets may be determined by enhanced intestinal barrier gene expression and varied immune gene expression through modulating the composition of the gut microbes. Together, our study revealed the intestinal characteristics of NX piglets, providing their potential application in improving breeding strategies and developing dietary interventions.
Asunto(s)
Microbioma Gastrointestinal , Transcriptoma , Animales , PorcinosRESUMEN
BACKGROUND: The application of reduced metagenomic sequencing approaches holds promise as a middle ground between targeted amplicon sequencing and whole metagenome sequencing approaches but has not been widely adopted as a technique. A major barrier to adoption is the lack of read simulation software built to handle characteristic features of these novel approaches. Reduced metagenomic sequencing (RMS) produces unique patterns of fragmentation per genome that are sensitive to restriction enzyme choice, and the non-uniform size selection of these fragments may introduce novel challenges to taxonomic assignment as well as relative abundance estimates. RESULTS: Through the development and application of simulation software, readsynth, we compare simulated metagenomic sequencing libraries with existing RMS data to assess the influence of multiple library preparation and sequencing steps on downstream analytical results. Based on read depth per position, readsynth achieved 0.79 Pearson's correlation and 0.94 Spearman's correlation to these benchmarks. Application of a novel estimation approach, fixed length taxonomic ratios, improved quantification accuracy of simulated human gut microbial communities when compared to estimates of mean or median coverage. CONCLUSIONS: We investigate the possible strengths and weaknesses of applying the RMS technique to profiling microbial communities via simulations with readsynth. The choice of restriction enzymes and size selection steps in library prep are non-trivial decisions that bias downstream profiling and quantification. The simulations investigated in this study illustrate the possible limits of preparing metagenomic libraries with a reduced representation sequencing approach, but also allow for the development of strategies for producing and handling the sequence data produced by this promising application.
Asunto(s)
Metagenoma , Metagenómica , Programas Informáticos , Metagenoma/genética , Metagenómica/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Microbioma Gastrointestinal/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
BACKGROUND: Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps. RESULTS: Here, we present Entourage to address this challenge. Entourage enables short-read sequence assembly, viral sequence search with or without reference virus targets using contig-based approaches, and intrasample sequence variation quantification. Several workflows are implemented in Entourage to facilitate end-to-end virus sequence detection analysis through a single command line, from read cleaning, sequence assembly, to virus sequence searching. The results generated are comprehensive, allowing for thorough quality control, reliability assessment, and interpretation. We illustrate Entourage's utility as a streamlined workflow for virus detection by employing it to comprehensively search for target virus sequences and beyond in raw sequence read data generated from HeLa cell culture samples spiked with viruses. Furthermore, we showcase its flexibility and performance on a real-world dataset by analysing a preassembled Tara Oceans dataset. Overall, our results show that Entourage performs well even with low virus sequencing depth in single digits, and it can be used to discover novel viruses effectively. Additionally, by using sequence data generated from a patient with chronic SARS-CoV-2 infection, we demonstrate Entourage's capability to quantify virus intrasample genetic variations, and generate publication-quality figures illustrating the results. CONCLUSIONS: Entourage is an all-in-one, versatile, and streamlined bioinformatics software for virome investigation, developed with a focus on ease of use. Entourage is available at https://codeberg.org/CENMIG/Entourage under the MIT license.
Asunto(s)
Genoma Viral , Secuenciación de Nucleótidos de Alto Rendimiento , SARS-CoV-2 , Programas Informáticos , Genoma Viral/genética , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , SARS-CoV-2/genética , Metagenómica/métodos , Virus/genética , COVID-19/virología , Viroma/genética , Células HeLaRESUMEN
BACKGROUND: Metagenomic sequencing technologies offered unprecedented opportunities and also challenges to microbiology and microbial ecology particularly. The technology has revolutionized the studies of microbes and enabled the high-profile human microbiome and earth microbiome projects. The terminology-change from microbes to microbiomes signals that our capability to count and classify microbes (microbiomes) has achieved the same or similar level as we can for the biomes (macrobiomes) of plants and animals (macrobes). While the traditional investigations of macrobiomes have usually been conducted through naturalists' (Linnaeus & Darwin) naked eyes, and aerial and satellite images (remote-sensing), the large-scale investigations of microbiomes have been made possible by DNA-sequencing-based metagenomic technologies. Two major types of metagenomic sequencing technologies-amplicon sequencing and whole-genome (shotgun sequencing)-respectively generate two contrastingly different categories of metagenomic reads (data)-OTU (operational taxonomic unit) tables representing microorganisms and OMU (operational metagenomic unit), a new term coined in this article to represent various cluster units of metagenomic genes. RESULTS: The ecological science of microbiomes based on the OTU representing microbes has been unified with the classic ecology of macrobes (macrobiomes), but the unification based on OMU representing metagenomes has been rather limited. In a previous series of studies, we have demonstrated the applications of several classic ecological theories (diversity, composition, heterogeneity, and biogeography) to the studies of metagenomes. Here I push the envelope for the unification of OTU and OMU again by demonstrating the applications of metacommunity assembly and ecological networks to the metagenomes of human gut microbiomes. Specifically, the neutral theory of biodiversity (Sloan's near neutral model), Ning et al.stochasticity framework, core-periphery network, high-salience skeleton network, special trio-motif, and positive-to-negative ratio are applied to analyze the OMU tables from whole-genome sequencing technologies, and demonstrated with seven human gut metagenome datasets from the human microbiome project. CONCLUSIONS: All of the ecological theories demonstrated previously and in this article, including diversity, composition, heterogeneity, stochasticity, and complex network analyses, are equally applicable to OMU metagenomic analyses, just as to OTU analyses. Consequently, I strongly advocate the unification of OTU/OMU (microbiomes) with classic ecology of plants and animals (macrobiomes) in the context of medical ecology.