RESUMEN
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from â¼700 newly sequenced microorganisms and â¼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Asunto(s)
Metagenómica , Programas Informáticos , Algoritmos , Benchmarking , Análisis de Secuencia de ADNRESUMEN
BACKGROUND: Ruminants are important contributors to global methane emissions via microbial fermentation in their reticulo-rumens. This study is part of a larger program, characterising the rumen microbiomes of sheep which vary naturally in methane yield (g CH4/kg DM/day) and aims to define differences in microbial communities, and in gene and transcript abundances that can explain the animal methane phenotype. METHODS: Rumen microbiome metagenomic and metatranscriptomic data were analysed by Gene Set Enrichment, sparse partial least squares regression and the Wilcoxon Rank Sum test to estimate correlations between specific KEGG bacterial pathways/genes and high methane yield in sheep. KEGG genes enriched in high methane yield sheep were reassembled from raw reads and existing contigs and analysed by MEGAN to predict their phylogenetic origin. Protein coding sequences from Succinivibrio dextrinosolvens strains were analysed using Effective DB to predict bacterial type III secreted proteins. The effect of S. dextrinosolvens strain H5 growth on methane formation by rumen methanogens was explored using co-cultures. RESULTS: Detailed analysis of the rumen microbiomes of high methane yield sheep shows that gene and transcript abundances of bacterial type III secretion system genes are positively correlated with methane yield in sheep. Most of the bacterial type III secretion system genes could not be assigned to a particular bacterial group, but several genes were affiliated with the genus Succinivibrio, and searches of bacterial genome sequences found that strains of S. dextrinosolvens were part of a small group of rumen bacteria that encode this type of secretion system. In co-culture experiments, S. dextrinosolvens strain H5 showed a growth-enhancing effect on a methanogen belonging to the order Methanomassiliicoccales, and inhibition of a representative of the Methanobrevibacter gottschalkii clade. CONCLUSIONS: This is the first report of bacterial type III secretion system genes being associated with high methane emissions in ruminants, and identifies these secretions systems as potential new targets for methane mitigation research. The effects of S. dextrinosolvens on the growth of rumen methanogens in co-cultures indicate that bacteria-methanogen interactions are important modulators of methane production in ruminant animals.
Asunto(s)
Bacterias/genética , Proteínas Bacterianas/genética , Regulación Bacteriana de la Expresión Génica , Metano/biosíntesis , Transcriptoma , Sistemas de Secreción Tipo III/genética , Animales , Bacterias/clasificación , Bacterias/aislamiento & purificación , Bacterias/metabolismo , Proteínas Bacterianas/metabolismo , Medios de Cultivo/química , Fermentación , Microbioma Gastrointestinal/genética , Ontología de Genes , Redes y Vías Metabólicas/genética , Metagenoma , Methanobrevibacter/genética , Methanobrevibacter/aislamiento & purificación , Methanobrevibacter/metabolismo , Anotación de Secuencia Molecular , Filogenia , Rumen/microbiología , Ovinos , Succinivibrionaceae/genética , Succinivibrionaceae/aislamiento & purificación , Succinivibrionaceae/metabolismo , Sistemas de Secreción Tipo III/metabolismoRESUMEN
Large insert mate pair reads have been used in de novo assembly and discovery of structural variants. We developed a new approach, Cre-LoxP inverse PCR paired end (CLIP-PE), which exploits the advantages of (1) Cre-LoxP recombination system to efficiently circularize large DNA fragments, (2) inverse PCR to enrich for the desired products that contain both ends of the large DNA fragments, and (3) use of restriction enzymes to introduce a recognizable junction site between ligated fragment ends. We have successfully created CLIP-PE libraries of up to 22 kb jumping pairs and demonstrated their ability to improve genome assemblies. The CLIP-PE methodology can be implemented with existing and future next-generation sequencing platforms.
Asunto(s)
Biblioteca de Genes , Genoma Fúngico , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Integrasas/genética , Recombinación Genética , Saccharomyces cerevisiae/genética , Enzimas de Restricción del ADN/metabolismo , ADN de Hongos/genética , ADN de Hongos/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Integrasas/metabolismo , Plásmidos/química , Plásmidos/metabolismo , Reacción en Cadena de la Polimerasa/métodos , Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADN/métodosRESUMEN
Fosmid end sequencing has been widely utilized in genome sequence assemblies and genome structural variation studies. We have developed a new approach to construct fosmid paired-end libraries that is suitable for Illumina sequencing platform. This approach employs a newly modified fosmid vector (pFosClip) which contains two loxP sites with identical orientation and two inverse Illumina adaptor priming sites flanking the cloning site. DNA prepared from the fosmid library constructed with pFosClip can be treated with the Cre recombinase to remove most of the vector DNA, leaving only 107 bp of the vector sequence with insert DNA. Frequent cutting restriction enzymes and ligase are used to digest the fosmid DNA to small (less than 1 Kb) fragments and recircularize the fosmid ends and all the internal fragments. Finally an inverse PCR step with the Illumina primers is used to enrich the fosmid paired ends (PEs) for sequencing. The advantages of this approach are the following: (1) the circularization of short fragments with sticky ends is efficient; therefore the success rate is higher than other approaches that attempt to join both blunt ends of large fosmid vectors; and (2) the restriction enzyme cutting generates an identifiable junction tag for splitting the paired reads. (3) Multiple restriction enzymes can be used to overcome possible enzyme-cutting bias. Our results have shown that this approach has produced mostly fosmid size (30-40 Kb) pairs from the targeted fungi and plant genomes and has drastically increased the scaffold sizes in the assembled genomes.
Asunto(s)
Biblioteca de Genes , Genoma Fúngico , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Integrasas/genética , Recombinación Genética , Saccharomyces cerevisiae/genética , Basidiomycota/genética , Basidiomycota/metabolismo , Enzimas de Restricción del ADN/metabolismo , ADN Circular/genética , ADN Circular/metabolismo , ADN de Hongos/genética , ADN de Hongos/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Integrasas/metabolismo , Plásmidos/química , Plásmidos/metabolismo , Reacción en Cadena de la Polimerasa/métodos , Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADN/métodosRESUMEN
BACKGROUND: Enteric fermentation by farmed ruminant animals is a major source of methane and constitutes the second largest anthropogenic contributor to global warming. Reducing methane emissions from ruminants is needed to ensure sustainable animal production in the future. Methane yield varies naturally in sheep and is a heritable trait that can be used to select animals that yield less methane per unit of feed eaten. We previously demonstrated elevated expression of hydrogenotrophic methanogenesis pathway genes of methanogenic archaea in the rumens of high methane yield (HMY) sheep compared to their low methane yield (LMY) counterparts. Methane production in the rumen is strongly connected to microbial hydrogen production through fermentation processes. In this study, we investigate the contribution that rumen bacteria make to methane yield phenotypes in sheep. RESULTS: Using deep sequence metagenome and metatranscriptome datasets in combination with 16S rRNA gene amplicon sequencing from HMY and LMY sheep, we show enrichment of lactate-producing Sharpea spp. in LMY sheep bacterial communities. Increased gene and transcript abundances for sugar import and utilisation and production of lactate, propionate and butyrate were also observed in LMY animals. Sharpea azabuensis and Megasphaera spp. act as important drivers of lactate production and utilisation according to phylogenetic analysis and read mappings. CONCLUSIONS: Our findings show that the rumen microbiome in LMY animals supports a rapid heterofermentative growth, leading to lactate production. We postulate that lactate is subsequently metabolised mainly to butyrate in LMY animals, producing 2 mol of hydrogen and 0.5 mol of methane per mol hexose, which represents 24 % less than the 0.66 mol of methane formed from the 2.66 mol of hydrogen produced if hexose fermentation was directly to acetate and butyrate. These findings are consistent with the theory that a smaller rumen size with a higher turnover rate, where rapid heterofermentative growth would be an advantage, results in lower hydrogen production and lower methane formation. Together with previous methanogen gene expression data, this builds a strong concept of how animal traits and microbial communities shape the methane phenotype in sheep.
Asunto(s)
Bacterias/clasificación , Bacterias/metabolismo , Hexosas/metabolismo , Ácido Láctico/metabolismo , Lactobacillaceae/metabolismo , Metano/metabolismo , Rumen/microbiología , Animales , Bacterias/genética , Secuencia de Bases , Butiratos/metabolismo , Ácidos Grasos/metabolismo , Fermentación , Calentamiento Global , Secuenciación de Nucleótidos de Alto Rendimiento , Lactobacillaceae/genética , Metagenoma/genética , Microbiota/genética , Propionatos/metabolismo , ARN Ribosómico 16S/genética , Rumen/fisiología , Análisis de Secuencia de ADN , OvinosRESUMEN
DNA methylation acts in concert with restriction enzymes to protect the integrity of prokaryotic genomes. Studies in a limited number of organisms suggest that methylation also contributes to prokaryotic genome regulation, but the prevalence and properties of such non-restriction-associated methylation systems remain poorly understood. Here, we used single molecule, real-time sequencing to map DNA modifications including m6A, m4C, and m5C across the genomes of 230 diverse bacterial and archaeal species. We observed DNA methylation in nearly all (93%) organisms examined, and identified a total of 834 distinct reproducibly methylated motifs. This data enabled annotation of the DNA binding specificities of 620 DNA Methyltransferases (MTases), doubling known specificities for previously hard to study Type I, IIG and III MTases, and revealing their extraordinary diversity. Strikingly, 48% of organisms harbor active Type II MTases with no apparent cognate restriction enzyme. These active 'orphan' MTases are present in diverse bacterial and archaeal phyla and show motif specificities and methylation patterns consistent with functions in gene regulation and DNA replication. Our results reveal the pervasive presence of DNA methylation throughout the prokaryotic kingdoms, as well as the diversity of sequence specificities and potential functions of DNA methylation systems.
Asunto(s)
Epigenómica , Células Procariotas/metabolismo , Secuencia Conservada , Metilación de ADN/genética , Replicación del ADN/genética , Enzimas de Restricción-Modificación del ADN/clasificación , Enzimas de Restricción-Modificación del ADN/metabolismo , Evolución Molecular , Regulación de la Expresión Génica , Genoma , Metiltransferasas/metabolismo , Anotación de Secuencia Molecular , Familia de Multigenes , Motivos de Nucleótidos/genética , Filogenia , Especificidad por SustratoRESUMEN
Multiple models describe the formation and evolution of distinct microbial phylogenetic groups. These evolutionary models make different predictions regarding how adaptive alleles spread through populations and how genetic diversity is maintained. Processes predicted by competing evolutionary models, for example, genome-wide selective sweeps vs gene-specific sweeps, could be captured in natural populations using time-series metagenomics if the approach were applied over a sufficiently long time frame. Direct observations of either process would help resolve how distinct microbial groups evolve. Here, from a 9-year metagenomic study of a freshwater lake (2005-2013), we explore changes in single-nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in 30 bacterial populations. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied by >1000-fold among populations. SNP allele frequencies also changed dramatically over time within some populations. Interestingly, nearly all SNP variants were slowly purged over several years from one population of green sulfur bacteria, while at the same time multiple genes either swept through or were lost from this population. These patterns were consistent with a genome-wide selective sweep in progress, a process predicted by the 'ecotype model' of speciation but not previously observed in nature. In contrast, other populations contained large, SNP-free genomic regions that appear to have swept independently through the populations prior to the study without purging diversity elsewhere in the genome. Evidence for both genome-wide and gene-specific sweeps suggests that different models of bacterial speciation may apply to different populations coexisting in the same environment.
Asunto(s)
Bacterias/genética , Genoma Bacteriano/genética , Metagenómica , Polimorfismo de Nucleótido Simple , Bacterias/clasificación , Bacterias/aislamiento & purificación , Evolución Biológica , Frecuencia de los Genes , Variación Genética , FilogeniaRESUMEN
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.
RESUMEN
Ruminant livestock represent the single largest anthropogenic source of the potent greenhouse gas methane, which is generated by methanogenic archaea residing in ruminant digestive tracts. While differences between individual animals of the same breed in the amount of methane produced have been observed, the basis for this variation remains to be elucidated. To explore the mechanistic basis of this methane production, we measured methane yields from 22 sheep, which revealed that methane yields are a reproducible, quantitative trait. Deep metagenomic and metatranscriptomic sequencing demonstrated a similar abundance of methanogens and methanogenesis pathway genes in high and low methane emitters. However, transcription of methanogenesis pathway genes was substantially increased in sheep with high methane yields. These results identify a discrete set of rumen methanogens whose methanogenesis pathway transcription profiles correlate with methane yields and provide new targets for CH4 mitigation at the levels of microbiota composition and transcriptional regulation.
Asunto(s)
Proteínas Arqueales/genética , Metagenoma , Metano/biosíntesis , Microbiota , Rumen/microbiología , Ovinos/microbiología , Animales , Archaea/genética , Archaea/metabolismo , Proteínas Arqueales/metabolismo , Secuencia de Bases , Datos de Secuencia Molecular , Fenotipo , Carácter Cuantitativo Heredable , Rumen/metabolismo , Ovinos/metabolismo , TranscriptomaRESUMEN
Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of â¼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications.
Asunto(s)
Bacterias/genética , Proteínas Bacterianas/genética , Celulasas/genética , Bacterias/metabolismo , Proteínas Bacterianas/metabolismo , Biomasa , Celulasas/metabolismo , Celulosa/metabolismo , Clonación Molecular , Genes Bacterianos , Genómica , Hidrólisis , Lignina/metabolismoRESUMEN
Large insert mate pair reads have a major impact on the overall success of de novo assembly and the discovery of inherited and acquired structural variants. The positional information of mate pair reads generally improves genome assembly by resolving repeat elements and/or ordering contigs. Currently available methods for building such libraries have one or more of limitations, such as relatively small insert size; unable to distinguish the junction of two ends; and/or low throughput. We developed a new approach, Cre-LoxP Inverse PCR Paired-End (CLIP-PE), which exploits the advantages of (1) Cre-LoxP recombination system to efficiently circularize large DNA fragments, (2) inverse PCR to enrich for the desired products that contain both ends of the large DNA fragments, and (3) the use of restriction enzymes to introduce a recognizable junction site between ligated fragment ends and to improve the self-ligation efficiency. We have successfully created CLIP-PE libraries up to 22 kb that are rich in informative read pairs and low in small fragment background. These libraries have demonstrated the ability to improve genome assemblies. The CLIP-PE methodology can be implemented with existing and future next-generation sequencing platforms.
Asunto(s)
Sitios de Ligazón Microbiológica/genética , Integrasas/metabolismo , Mutagénesis Insercional/genética , Reacción en Cadena de la Polimerasa/métodos , Euryarchaeota/genética , Biblioteca de Genes , Estándares de Referencia , Saccharomyces cerevisiae/genética , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
The predominance of rRNAs in the transcriptome is a major technical challenge in sequence-based analysis of cDNAs from microbial isolates and communities. Several approaches have been applied to deplete rRNAs from (meta)transcriptomes, but no systematic investigation of potential biases introduced by any of these approaches has been reported. Here we validated the effectiveness and fidelity of the two most commonly used approaches, subtractive hybridization and exonuclease digestion, as well as combinations of these treatments, on two synthetic five-microorganism metatranscriptomes using massively parallel sequencing. We found that the effectiveness of rRNA removal was a function of community composition and RNA integrity for these treatments. Subtractive hybridization alone introduced the least bias in relative transcript abundance, whereas exonuclease and in particular combined treatments greatly compromised mRNA abundance fidelity. Illumina sequencing itself also can compromise quantitative data analysis by introducing a G+C bias between runs.