RESUMO
The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751-D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723-D733, 2021, https://doi.org/10.1093/nar/gkaa983).IMPORTANCE The DOE JGI Metagenome Workflow is designed for processing metagenomic data sets starting from Illumina fastq files. It performs data preprocessing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff, and can be used for subsequent integration into the Integrated Microbial Genomes and Microbiomes (IMG/M) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 30 July 2020, 7,155 JGI metagenomes have been processed by the DOE JGI Metagenome Workflow. Here, we present a metagenome workflow developed at the JGI that generates rich data in standard formats and has been optimized for downstream analyses ranging from assessment of the functional and taxonomic composition of microbial communities to genome-resolved metagenomics and the identification and characterization of novel taxa. This workflow is currently being used to analyze thousands of metagenomic data sets in a consistent and standardized manner.
RESUMO
We combined viral genome sequencing with contact tracing to investigate introduction and evolution of severe acute respiratory syndrome coronavirus 2 lineages in Santa Clara County, California, from 27 January to 21 March 2020. From 558 persons with coronavirus disease 2019, 101 genomes from 143 available clinical samples comprised 17 lineages, including SCC1 (nâ =â 41), WA1 (nâ =â 9; including the first 2 reported deaths in the United States, with postmortem diagnosis), D614G (nâ =â 4), ancestral Wuhan Hu-1 (nâ =â 21), and 13 others (nâ =â 26). Public health intervention may have curtailed the persistence of lineages that appeared transiently during February and March. By August, only D614G lineages introduced after 21 March were circulating in Santa Clara County.
Assuntos
COVID-19/epidemiologia , COVID-19/transmissão , SARS-CoV-2/genética , Adulto , Idoso , COVID-19/prevenção & controle , California/epidemiologia , Busca de Comunicante , Feminino , Variação Genética , Genoma Viral/genética , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Filogenia , Fatores de Risco , SARS-CoV-2/classificação , Viagem , Adulto JovemRESUMO
[This corrects the article DOI: 10.3389/fmicb.2020.582590.].
RESUMO
BACKGROUND: Horizontal gene transfer (HGT) plays a central role in microbial evolution. Our understanding of the mechanisms, frequency, and taxonomic range of HGT in polymicrobial environments is limited, as we currently rely on historical HGT events inferred from genome sequencing and studies involving cultured microorganisms. We lack approaches to observe ongoing HGT in microbial communities. RESULTS: To address this knowledge gap, we developed a DNA sequencing-based "transductomics" approach that detects and characterizes microbial DNA transferred via transduction. We validated our approach using model systems representing a range of transduction modes and show that we can detect numerous classes of transducing DNA. Additionally, we show that we can use this methodology to obtain insights into DNA transduction among all major taxonomic groups of the intestinal microbiome. CONCLUSIONS: The transductomics approach that we present here allows for the detection and characterization of genes that are potentially transferred between microbes in complex microbial communities at the time of measurement and thus provides insights into real-time ongoing horizontal gene transfer. This work extends the genomic toolkit for the broader study of mobile DNA within microbial communities and could be used to understand how phenotypes spread within microbiomes. Video Abstract.
Assuntos
DNA Bacteriano/análise , DNA Bacteriano/genética , Transferência Genética Horizontal/genética , Genômica , Microbiota/genética , Transdução Genética , Animais , Microbioma Gastrointestinal/genética , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Reprodutibilidade dos TestesRESUMO
Ionizing radiation (IR) is lethal to most organisms at high doses, damaging every cellular macromolecule via induction of reactive oxygen species (ROS). Utilizing experimental evolution and continuing previous work, we have generated the most IR-resistant Escherichia coli populations developed to date. After 100 cycles of selection, the dose required to kill 99% the four replicate populations (IR9-100, IR10-100, IR11-100, and IR12-100) has increased from 750 Gy to approximately 3,000 Gy. Fitness trade-offs, specialization, and clonal interference are evident. Long-lived competing sub-populations are present in three of the four lineages. In IR9, one lineage accumulates the heme precursor, porphyrin, leading to generation of yellow-brown colonies. Major genomic alterations are present. IR9 and IR10 exhibit major deletions and/or duplications proximal to the chromosome replication terminus. Contributions to IR resistance have expanded beyond the alterations in DNA repair systems documented previously. Variants of proteins involved in ATP synthesis (AtpA), iron-sulfur cluster biogenesis (SufD) and cadaverine synthesis (CadA) each contribute to IR resistance in IR9-100. Major genomic and physiological changes are emerging. An isolate from IR10 exhibits protein protection from ROS similar to the extremely radiation resistant bacterium Deinococcus radiodurans, without evident changes in cellular metal homeostasis. Selection is continuing with no limit to IR resistance in evidence as our E. coli populations approach levels of IR resistance typical of D. radiodurans.
RESUMO
The coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread globally, with >365,000 cases in California as of 17 July 2020. We investigated the genomic epidemiology of SARS-CoV-2 in Northern California from late January to mid-March 2020, using samples from 36 patients spanning nine counties and the Grand Princess cruise ship. Phylogenetic analyses revealed the cryptic introduction of at least seven different SARS-CoV-2 lineages into California, including epidemic WA1 strains associated with Washington state, with lack of a predominant lineage and limited transmission among communities. Lineages associated with outbreak clusters in two counties were defined by a single base substitution in the viral genome. These findings support contact tracing, social distancing, and travel restrictions to contain the spread of SARS-CoV-2 in California and other states.
Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/virologia , Genoma Viral , Filogenia , Pneumonia Viral/epidemiologia , Pneumonia Viral/virologia , COVID-19 , California/epidemiologia , Infecções por Coronavirus/transmissão , Monitoramento Epidemiológico , Humanos , Pandemias , Pneumonia Viral/transmissão , SARS-CoV-2 , Alinhamento de Sequência , Navios , Viagem , WashingtonRESUMO
In early-to-mid March 2020, 20 of 46 (43%) COVID-19 cases at a tertiary care hospital in San Francisco, California were travel related. Cases were significantly associated with travel to either Europe (odds ratio, 6.1) or New York (odds ratio, 32.9). Viral genomes recovered from 9 of 12 (75%) cases co-clustered with lineages circulating in Europe.
Assuntos
COVID-19 , Europa (Continente) , Humanos , New York , SARS-CoV-2 , São Francisco/epidemiologia , Viagem , Doença Relacionada a ViagensRESUMO
In previous work (D. R. Harris et al., J Bacteriol 191:5240-5252, 2009, https://doi.org/10.1128/JB.00502-09; B. T. Byrne et al., Elife 3:e01322, 2014, https://doi.org/10.7554/eLife.01322), we demonstrated that Escherichia coli could acquire substantial levels of resistance to ionizing radiation (IR) via directed evolution. Major phenotypic contributions involved adaptation of organic systems for DNA repair. We have now undertaken an extended effort to generate E. coli populations that are as resistant to IR as Deinococcus radiodurans After an initial 50 cycles of selection using high-energy electron beam IR, four replicate populations exhibit major increases in IR resistance but have not yet reached IR resistance equivalent to D. radiodurans Regular deep sequencing reveals complex evolutionary patterns with abundant clonal interference. Prominent IR resistance mechanisms involve novel adaptations to DNA repair systems and alterations in RNA polymerase. Adaptation is highly specialized to resist IR exposure, since isolates from the evolved populations exhibit highly variable patterns of resistance to other forms of DNA damage. Sequenced isolates from the populations possess between 184 and 280 mutations. IR resistance in one isolate, IR9-50-1, is derived largely from four novel mutations affecting DNA and RNA metabolism: RecD A90E, RecN K429Q, and RpoB S72N/RpoC K1172I. Additional mechanisms of IR resistance are evident.IMPORTANCE Some bacterial species exhibit astonishing resistance to ionizing radiation, with Deinococcus radiodurans being the archetype. As natural IR sources rarely exceed mGy levels, the capacity of Deinococcus to survive 5,000 Gy has been attributed to desiccation resistance. To understand the molecular basis of true extreme IR resistance, we are using experimental evolution to generate strains of Escherichia coli with IR resistance levels comparable to Deinococcus Experimental evolution has previously generated moderate radioresistance for multiple bacterial species. However, these efforts could not take advantage of modern genomic sequencing technologies. In this report, we examine four replicate bacterial populations after 50 selection cycles. Genomic sequencing allows us to follow the genesis of mutations in populations throughout selection. Novel mutations affecting genes encoding DNA repair proteins and RNA polymerase enhance radioresistance. However, more contributors are apparent.
Assuntos
Evolução Biológica , Escherichia coli/genética , Escherichia coli/efeitos da radiação , Tolerância a Radiação , Radiação Ionizante , Seleção Genética , Análise Mutacional de DNA , Enzimas Reparadoras do DNA/genética , RNA Polimerases Dirigidas por DNA/genética , Deinococcus/crescimento & desenvolvimento , Deinococcus/efeitos da radiação , Escherichia coli/crescimento & desenvolvimento , Sequenciamento de Nucleotídeos em Larga Escala , MutaçãoRESUMO
The dysregulation of intestinal microbial communities is associated with inflammatory bowel diseases (IBD). Studies aimed at understanding the contribution of the microbiota to inflammatory diseases have primarily focused on bacteria, yet the intestine harbours a viral component dominated by prokaryotic viruses known as bacteriophages (phages). Phage numbers are elevated at the intestinal mucosal surface and phages increase in abundance during IBD, suggesting that phages play an unidentified role in IBD. We used a sequence-independent approach for the selection of viral contigs and then applied quantitative metagenomics to study intestinal phages in a mouse model of colitis. We discovered that during colitis the intestinal phage population is altered and transitions from an ordered state to a stochastic dysbiosis. We identified phages specific to pathobiotic hosts associated with intestinal disease, whose abundances are altered during colitis. Additionally, phage populations in healthy and diseased mice overlapped with phages from healthy humans and humans with IBD. Our findings indicate that intestinal phage communities are altered during inflammatory disease, establishing a platform for investigating phage involvement in IBD.
Assuntos
Bacteriófagos/isolamento & purificação , Colite/patologia , Colite/virologia , Disbiose/virologia , Microbioma Gastrointestinal/fisiologia , Mucosa Intestinal/virologia , Animais , Bacteriófagos/genética , Células Cultivadas , Modelos Animais de Doenças , Genoma Viral/genética , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos KnockoutRESUMO
Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Biologia ComputacionalRESUMO
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structures at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.
Assuntos
Archaea/classificação , Bactérias/classificação , Variação Genética , Filogenia , Archaea/genética , Archaea/isolamento & purificação , Bactérias/genética , Bactérias/isolamento & purificação , Colúmbia Britânica , Sequenciamento de Nucleotídeos em Larga Escala , Lagos/microbiologia , RNA Ribossômico 16S/genética , Análise de Sequência de DNARESUMO
Multiple models describe the formation and evolution of distinct microbial phylogenetic groups. These evolutionary models make different predictions regarding how adaptive alleles spread through populations and how genetic diversity is maintained. Processes predicted by competing evolutionary models, for example, genome-wide selective sweeps vs gene-specific sweeps, could be captured in natural populations using time-series metagenomics if the approach were applied over a sufficiently long time frame. Direct observations of either process would help resolve how distinct microbial groups evolve. Here, from a 9-year metagenomic study of a freshwater lake (2005-2013), we explore changes in single-nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in 30 bacterial populations. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied by >1000-fold among populations. SNP allele frequencies also changed dramatically over time within some populations. Interestingly, nearly all SNP variants were slowly purged over several years from one population of green sulfur bacteria, while at the same time multiple genes either swept through or were lost from this population. These patterns were consistent with a genome-wide selective sweep in progress, a process predicted by the 'ecotype model' of speciation but not previously observed in nature. In contrast, other populations contained large, SNP-free genomic regions that appear to have swept independently through the populations prior to the study without purging diversity elsewhere in the genome. Evidence for both genome-wide and gene-specific sweeps suggests that different models of bacterial speciation may apply to different populations coexisting in the same environment.
Assuntos
Bactérias/genética , Genoma Bacteriano/genética , Metagenômica , Polimorfismo de Nucleotídeo Único , Bactérias/classificação , Bactérias/isolamento & purificação , Evolução Biológica , Frequência do Gene , Variação Genética , FilogeniaRESUMO
MOTIVATION: Metagenomic sequencing allows reconstruction of microbial genomes directly from environmental samples. Omega (overlap-graph metagenome assembler) was developed for assembling and scaffolding Illumina sequencing data of microbial communities. RESULTS: Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets. AVAILABILITY AND IMPLEMENTATION: Implemented in C++ with source code and binaries freely available at http://omega.omicsbio.org.