Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
1.
Cell ; 168(3): 517-526.e18, 2017 01 26.
Article in English | MEDLINE | ID: mdl-28111075

ABSTRACT

The gut microbiota modulate host biology in numerous ways, but little is known about the molecular mediators of these interactions. Previously, we found a widely distributed family of nonribosomal peptide synthetase gene clusters in gut bacteria. Here, by expressing a subset of these clusters in Escherichia coli or Bacillus subtilis, we show that they encode pyrazinones and dihydropyrazinones. At least one of the 47 clusters is present in 88% of the National Institutes of Health Human Microbiome Project (NIH HMP) stool samples, and they are transcribed under conditions of host colonization. We present evidence that the active form of these molecules is the initially released peptide aldehyde, which bears potent protease inhibitory activity and selectively targets a subset of cathepsins in human cell proteomes. Our findings show that an approach combining bioinformatics, synthetic biology, and heterologous gene cluster expression can rapidly expand our knowledge of the metabolic potential of the microbiota while avoiding the challenges of cultivating fastidious commensals.


Subject(s)
Bacteria/metabolism , Gastrointestinal Microbiome , Microbiota , Peptide Synthases/metabolism , Pyrazines/metabolism , Animals , Bacillus subtilis/genetics , Bacteria/classification , Bacteria/genetics , Escherichia coli/genetics , Feces/microbiology , Humans , Peptide Synthases/genetics , Phylogeny
2.
Cell ; 166(5): 1103-1116, 2016 Aug 25.
Article in English | MEDLINE | ID: mdl-27565341

ABSTRACT

Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized.


Subject(s)
Metagenome , Metagenomics/standards , Microbiota/genetics , Classification , Computational Biology , Humans , Models, Statistical
3.
Nature ; 622(7983): 594-602, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821698

ABSTRACT

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Subject(s)
Metagenome , Metagenomics , Microbiology , Proteins , Cluster Analysis , Metagenome/genetics , Metagenomics/methods , Proteins/chemistry , Proteins/classification , Proteins/genetics , Databases, Protein , Protein Conformation
4.
PLoS Biol ; 21(4): e3002083, 2023 04.
Article in English | MEDLINE | ID: mdl-37083735

ABSTRACT

The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.


Subject(s)
Archaea , Viruses , Archaea/genetics , Metagenome/genetics , Viruses/genetics , Bacteria/genetics , Metagenomics/methods , Machine Learning , Genome, Viral/genetics
5.
Nucleic Acids Res ; 52(D1): D164-D173, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37930866

ABSTRACT

Plasmids are mobile genetic elements found in many clades of Archaea and Bacteria. They drive horizontal gene transfer, impacting ecological and evolutionary processes within microbial communities, and hold substantial importance in human health and biotechnology. To support plasmid research and provide scientists with data of an unprecedented diversity of plasmid sequences, we introduce the IMG/PR database, a new resource encompassing 699 973 plasmid sequences derived from genomes, metagenomes and metatranscriptomes. IMG/PR is the first database to provide data of plasmid that were systematically identified from diverse microbiome samples. IMG/PR plasmids are associated with rich metadata that includes geographical and ecosystem information, host taxonomy, similarity to other plasmids, functional annotation, presence of genes involved in conjugation and antibiotic resistance. The database offers diverse methods for exploring its extensive plasmid collection, enabling users to navigate plasmids through metadata-centric queries, plasmid comparisons and BLAST searches. The web interface for IMG/PR is accessible at https://img.jgi.doe.gov/pr. Plasmid metadata and sequences can be downloaded from https://genome.jgi.doe.gov/portal/IMG_PR.


Subject(s)
Metagenome , Microbiota , Humans , Metadata , Software , Databases, Genetic , Plasmids/genetics
6.
Nature ; 568(7753): 505-510, 2019 04.
Article in English | MEDLINE | ID: mdl-30867587

ABSTRACT

The genome sequences of many species of the human gut microbiome remain unknown, largely owing to challenges in cultivating microorganisms under laboratory conditions. Here we address this problem by reconstructing 60,664 draft prokaryotic genomes from 3,810 faecal metagenomes, from geographically and phenotypically diverse humans. These genomes provide reference points for 2,058 newly identified species-level operational taxonomic units (OTUs), which represents a 50% increase over the previously known phylogenetic diversity of sequenced gut bacteria. On average, the newly identified OTUs comprise 33% of richness and 28% of species abundance per individual, and are enriched in humans from rural populations. A meta-analysis of clinical gut-microbiome studies pinpointed numerous disease associations for the newly identified OTUs, which have the potential to improve predictive models. Finally, our analysis revealed that uncultured gut species have undergone genome reduction that has resulted in the loss of certain biosynthetic pathways, which may offer clues for improving cultivation strategies in the future.


Subject(s)
Bacteria/classification , Bacteria/genetics , Gastrointestinal Microbiome/genetics , Genome, Bacterial/genetics , Metagenome/genetics , Bacteria/growth & development , Bacteria/isolation & purification , Bacterial Physiological Phenomena/genetics , Biosynthetic Pathways/genetics , Disease , Feces/microbiology , Gastrointestinal Microbiome/physiology , Genomics , Geographic Mapping , Humans , Phylogeny , Rural Population , Species Specificity
7.
Nucleic Acids Res ; 51(D1): D733-D743, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36399502

ABSTRACT

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration of the global virosphere, progressively revealing the extensive genomic diversity of viruses on Earth and highlighting the myriad of ways by which viruses impact biological processes. IMG/VR provides access to the largest collection of viral sequences obtained from (meta)genomes, along with functional annotation and rich metadata. A web interface enables users to efficiently browse and search viruses based on genome features and/or sequence similarity. Here, we present the fourth version of IMG/VR, composed of >15 million virus genomes and genome fragments, a ≈6-fold increase in size compared to the previous version. These clustered into 8.7 million viral operational taxonomic units, including 231 408 with at least one high-quality representative. Viral sequences in IMG/VR are now systematically identified from genomes, metagenomes, and metatranscriptomes using a new detection approach (geNomad), and IMG standard annotation are complemented with genome quality estimation using CheckV, taxonomic classification reflecting the latest taxonomic standards, and microbial host taxonomy prediction. IMG/VR v4 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.


Subject(s)
Databases, Genetic , Genome, Viral , Metadata , Metagenomics , Software
8.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36321886

ABSTRACT

SUMMARY: The Metagenomic Intra-Species Diversity Analysis System (MIDAS) is a scalable metagenomic pipeline that identifies single nucleotide variants (SNVs) and gene copy number variants in microbial populations. Here, we present MIDAS2, which addresses the computational challenges presented by increasingly large reference genome databases, while adding functionality for building custom databases and leveraging paired-end reads to improve SNV accuracy. This fast and scalable reengineering of the MIDAS pipeline enables thousands of metagenomic samples to be efficiently genotyped. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/czbiohub/MIDAS2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Metagenome , Software , Metagenomics , Genotype , Databases, Factual
9.
Nucleic Acids Res ; 49(D1): D764-D775, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33137183

ABSTRACT

Viruses are integral components of all ecosystems and microbiomes on Earth. Through pervasive infections of their cellular hosts, viruses can reshape microbial community structure and drive global nutrient cycling. Over the past decade, viral sequences identified from genomes and metagenomes have provided an unprecedented view of viral genome diversity in nature. Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. Here, we present the third version of IMG/VR, composed of 18 373 cultivated and 2 314 329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. These clustered into 935 362 viral Operational Taxonomic Units (vOTUs), including 188 930 with two or more members. UViGs in IMG/VR are now reported as single viral contigs, integrated proviruses or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity. IMG/VR v3 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.


Subject(s)
Databases, Genetic , Ecosystem , Evolution, Molecular , Genome, Viral , Viruses/genetics , Base Sequence , Cluster Analysis , Geography , Molecular Sequence Annotation , Sequence Homology, Nucleic Acid , User-Computer Interface
10.
Genome Res ; 26(11): 1612-1625, 2016 11.
Article in English | MEDLINE | ID: mdl-27803195

ABSTRACT

We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare SNPs to track strains between hosts. Using this approach, we found that although species compositions of mothers and infants converged over time, strain-level similarity diverged. Specifically, early colonizing bacteria were often transmitted from an infant's mother, while late colonizing bacteria were often transmitted from other sources in the environment and were enriched for spore-formation genes. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data are analyzed at a coarser taxonomic resolution.


Subject(s)
DNA Fingerprinting/methods , Genome, Bacterial , Infectious Disease Transmission, Vertical , Metagenome , Metagenomics/methods , Adult , Bacteria/classification , Bacteria/genetics , Bacterial Infections/transmission , DNA Fingerprinting/standards , Feces/microbiology , Female , Humans , Infant , Metagenomics/standards , Phylogeny , Phylogeography , Polymorphism, Single Nucleotide , Reference Standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Soil Microbiology , Water Microbiology
11.
PLoS Comput Biol ; 14(8): e1006242, 2018 08.
Article in English | MEDLINE | ID: mdl-30091981

ABSTRACT

The mechanisms by which different microbes colonize the healthy human gut versus other body sites, the gut in disease states, or other environments remain largely unknown. Identifying microbial genes influencing fitness in the gut could lead to new ways to engineer probiotics or disrupt pathogenesis. We approach this problem by measuring the statistical association between a species having a gene and the probability that the species is present in the gut microbiome. The challenge is that closely related species tend to be jointly present or absent in the microbiome and also share many genes, only a subset of which are involved in gut adaptation. We show that this phylogenetic correlation indeed leads to many false discoveries and propose phylogenetic linear regression as a powerful solution. To apply this method across the bacterial tree of life, where most species have not been experimentally phenotyped, we use metagenomes from hundreds of people to quantify each species' prevalence in and specificity for the gut microbiome. This analysis reveals thousands of genes potentially involved in adaptation to the gut across species, including many novel candidates as well as processes known to contribute to fitness of gut bacteria, such as acid tolerance in Bacteroidetes and sporulation in Firmicutes. We also find microbial genes associated with a preference for the gut over other body sites, which are significantly enriched for genes linked to fitness in an in vivo competition experiment. Finally, we identify gene families associated with higher prevalence in patients with Crohn's disease, including Proteobacterial genes involved in conjugation and fimbria regulation, processes previously linked to inflammation. These gene targets may represent new avenues for modulating host colonization and disease. Our strategy of combining metagenomics with phylogenetic modeling is general and can be used to identify genes associated with adaptation to any environment.


Subject(s)
Gastrointestinal Microbiome/genetics , Metagenomics/methods , Bacteria/genetics , Gastrointestinal Microbiome/physiology , Gene Expression Regulation, Bacterial/genetics , Genes, Microbial/genetics , Humans , Metagenome , Microbiota/genetics , Phylogeny
12.
Bioinformatics ; 31(20): 3368-70, 2015 Oct 15.
Article in English | MEDLINE | ID: mdl-26104745

ABSTRACT

UNLABELLED: Microbiome researchers frequently want to know how abundant a particular microbial gene or pathway is across different human hosts, including its association with disease and its co-occurrence with other genes or microbial taxa. With thousands of publicly available metagenomes, these questions should be easy to answer. However, computational barriers prevent most researchers from conducting such analyses. We address this problem with MetaQuery, a web application for rapid and quantitative analysis of specific genes in the human gut microbiome. The user inputs one or more query genes, and our software returns the estimated abundance of these genes across 1267 publicly available fecal metagenomes from American, European and Chinese individuals. In addition, our application performs downstream statistical analyses to identify features that are associated with gene variation, including other query genes (i.e. gene co-variation), taxa, clinical variables (e.g. inflammatory bowel disease and diabetes) and average genome size. The speed and accessibility of MetaQuery are a step toward democratizing metagenomics research, which should allow many researchers to query the abundance and variation of specific genes in the human gut microbiome. AVAILABILITY AND IMPLEMENTATION: http://metaquery.docpollard.org. CONTACT: snayfach@gmail.comS UPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gastrointestinal Microbiome/genetics , Metagenomics/methods , Software , Diabetes Mellitus/microbiology , Feces/microbiology , Genes, Bacterial , Humans , Inflammatory Bowel Diseases/microbiology , Internet , Molecular Sequence Annotation
13.
PLoS Comput Biol ; 11(11): e1004573, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26565399

ABSTRACT

Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn's disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.


Subject(s)
Chromosome Mapping/methods , Metagenome/genetics , Metagenomics/methods , Microbiota/genetics , Computer Simulation , Crohn Disease/microbiology , Genetic Markers/genetics , Humans , Models, Genetic
14.
STAR Protoc ; 4(1): 101964, 2023 03 17.
Article in English | MEDLINE | ID: mdl-36856771

ABSTRACT

Genotyping single-nucleotide polymorphisms (SNPs) in microbiomes enables strain-level quantification. In this protocol, we describe a computational pipeline that performs fast and accurate SNP genotyping using metagenomic data. We first demonstrate how to use Maast to catalog SNPs from microbial genomes. Then we use GT-Pro to extract unique SNP-covering k-mers, optimize a data structure for storing these k-mers, and finally perform metagenotyping. For proof of concept, the protocol leverages public whole-genome sequences to metagenotype a synthetic community. For complete details on the use and execution of this protocol, please refer to Shi et al. (2022a)1 and Shi et al. (2022b).2.


Subject(s)
Genome , Microbiota , Microbiota/genetics , Polymorphism, Single Nucleotide/genetics
15.
Genome Biol ; 24(1): 186, 2023 08 10.
Article in English | MEDLINE | ID: mdl-37563669

ABSTRACT

Existing single nucleotide polymorphism (SNP) genotyping algorithms do not scale for species with thousands of sequenced strains, nor do they account for conspecific redundancy. Here we present a bioinformatics tool, Maast, which empowers population genetic meta-analysis of microbes at an unrivaled scale. Maast implements a novel algorithm to heuristically identify a minimal set of diverse conspecific genomes, then constructs a reliable SNP panel for each species, and enables rapid and accurate genotyping using a hybrid of whole-genome alignment and k-mer exact matching. We demonstrate Maast's utility by genotyping thousands of Helicobacter pylori strains and tracking SARS-CoV-2 diversification.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Genotype , SARS-CoV-2/genetics , Genome , Algorithms , Polymorphism, Single Nucleotide , Genotyping Techniques
16.
Nat Biotechnol ; 2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37735266

ABSTRACT

Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad .

17.
Nat Biotechnol ; 40(4): 507-516, 2022 04.
Article in English | MEDLINE | ID: mdl-34949778

ABSTRACT

Single nucleotide polymorphisms (SNPs) in metagenomics are used to quantify population structure, track strains and identify genetic determinants of microbial phenotypes. However, existing alignment-based approaches for metagenomic SNP detection require high-performance computing and enough read coverage to distinguish SNPs from sequencing errors. To address these issues, we developed the GenoTyper for Prokaryotes (GT-Pro), a suite of methods to catalog SNPs from genomes and use unique k-mers to rapidly genotype these SNPs from metagenomes. Compared to methods that use read alignment, GT-Pro is more accurate and two orders of magnitude faster. Using high-quality genomes, we constructed a catalog of 104 million SNPs in 909 human gut species and used unique k-mers targeting this catalog to characterize the global population structure of gut microbes from 7,459 samples. GT-Pro enables fast and memory-efficient metagenotyping of millions of SNPs on a personal computer.


Subject(s)
Gastrointestinal Microbiome , Microbiota , Gastrointestinal Microbiome/genetics , Genotype , Humans , Metagenome/genetics , Metagenomics/methods , Microbiota/genetics , Software
18.
ISME J ; 16(5): 1337-1347, 2022 05.
Article in English | MEDLINE | ID: mdl-34969995

ABSTRACT

With advances in DNA sequencing and miniaturized molecular biology workflows, rapid and affordable sequencing of single-cell genomes has become a reality. Compared to 16S rRNA gene surveys and shotgun metagenomics, large-scale application of single-cell genomics to whole microbial communities provides an integrated snapshot of community composition and function, directly links mobile elements to their hosts, and enables analysis of population heterogeneity of the dominant community members. To that end, we sequenced nearly 500 single-cell genomes from a low diversity hot spring sediment sample from Dewar Creek, British Columbia, and compared this approach to 16S rRNA gene amplicon and shotgun metagenomics applied to the same sample. We found that the broad taxonomic profiles were similar across the three sequencing approaches, though several lineages were missing from the 16S rRNA gene amplicon dataset, likely the result of primer mismatches. At the functional level, we detected a large array of mobile genetic elements present in the single-cell genomes but absent from the corresponding same species metagenome-assembled genomes. Moreover, we performed a single-cell population genomic analysis of the three most abundant community members, revealing differences in population structure based on mutation and recombination profiles. While the average pairwise nucleotide identities were similar across the dominant species-level lineages, we observed differences in the extent of recombination between these dominant populations. Most intriguingly, the creek's Hydrogenobacter sp. population appeared to be so recombinogenic that it more closely resembled a sexual species than a clonally evolving microbe. Together, this work demonstrates that a randomized single-cell approach can be useful for the exploration of previously uncultivated microbes from community composition to population structure.


Subject(s)
Hot Springs , Bacteria/genetics , Metagenome , Metagenomics , RNA, Ribosomal, 16S/genetics
19.
Cell Genom ; 2(12): 100213, 2022 Dec 14.
Article in English | MEDLINE | ID: mdl-36778052

ABSTRACT

The phylum Actinobacteria includes important human pathogens like Mycobacterium tuberculosis and Corynebacterium diphtheriae and renowned producers of secondary metabolites of commercial interest, yet only a small part of its diversity is represented by sequenced genomes. Here, we present 824 actinobacterial isolate genomes in the context of a phylum-wide analysis of 6,700 genomes including public isolates and metagenome-assembled genomes (MAGs). We estimate that only 30%-50% of projected actinobacterial phylogenetic diversity possesses genomic representation via isolates and MAGs. A comparison of gene functions reveals novel determinants of host-microbe interaction as well as environment-specific adaptations such as potential antimicrobial peptides. We identify plasmids and prophages across isolates and uncover extensive prophage diversity structured mainly by host taxonomy. Analysis of >80,000 biosynthetic gene clusters reveals that horizontal gene transfer and gene loss shape secondary metabolite repertoire across taxa. Our observations illustrate the essential role of and need for high-quality isolate genome sequences.

20.
Annu Rev Biomed Data Sci ; 4: 369-391, 2021 07 20.
Article in English | MEDLINE | ID: mdl-34465172

ABSTRACT

Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the last century, the discovery and characterization of viruses have progressed steadily alongside much of modern biology. In terms of outright numbers of novel viruses discovered, however, the last few years have been by far the most transformative for the field. Advances in methods for identifying viral sequences in genomic and metagenomic datasets, coupled to the exponential growth of environmental sequencing, have greatly expanded the catalog of known viruses and fueled the tremendous growth of viral sequence databases. Development and implementation of new standards, along with careful study of the newly discovered viruses, have transformed and will continue to transform our understanding of microbial evolution, ecology, and biogeochemical cycles, leading to new biotechnological innovations across many diverse fields, including environmental, agricultural, and biomedical sciences.


Subject(s)
Metagenomics , Viruses , Ecology , Genome, Viral , Metagenome , Viruses/genetics
SELECTION OF CITATIONS
SEARCH DETAIL