Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters

Database
Country/Region as subject
Language
Affiliation country
Publication year range
1.
Proc Natl Acad Sci U S A ; 111(13): 4904-9, 2014 Apr 01.
Article in English | MEDLINE | ID: mdl-24632729

ABSTRACT

The large volumes of sequencing data required to sample deeply the microbial communities of complex environments pose new challenges to sequence analysis. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires substantial computational resources. We combine two preassembly filtering approaches--digital normalization and partitioning--to generate previously intractable large metagenome assemblies. Using a human-gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes totaling 398 billion bp (equivalent to 88,000 Escherichia coli genomes) from matched Iowa corn and native prairie soils. The resulting assembled contigs could be used to identify molecular interactions and reaction networks of known metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes Orthology database. Nonetheless, more than 60% of predicted proteins in assemblies could not be annotated against known databases. Many of these unknown proteins were abundant in both corn and prairie soils, highlighting the benefits of assembly for the discovery and characterization of novelty in soil biodiversity. Moreover, 80% of the sequencing data could not be assembled because of low coverage, suggesting that considerably more sequencing data are needed to characterize the functional content of soil.


Subject(s)
Biodiversity , Metagenome/genetics , Soil Microbiology , Soil , Gastrointestinal Tract/microbiology , Humans , Iowa , Species Specificity , Zea mays/genetics
2.
mBio ; 5(2): e00889, 2014 Apr 22.
Article in English | MEDLINE | ID: mdl-24757212

ABSTRACT

Butyrate-producing bacteria have recently gained attention, since they are important for a healthy colon and when altered contribute to emerging diseases, such as ulcerative colitis and type II diabetes. This guild is polyphyletic and cannot be accurately detected by 16S rRNA gene sequencing. Consequently, approaches targeting the terminal genes of the main butyrate-producing pathway have been developed. However, since additional pathways exist and alternative, newly recognized enzymes catalyzing the terminal reaction have been described, previous investigations are often incomplete. We undertook a broad analysis of butyrate-producing pathways and individual genes by screening 3,184 sequenced bacterial genomes from the Integrated Microbial Genome database. Genomes of 225 bacteria with a potential to produce butyrate were identified, including many previously unknown candidates. The majority of candidates belong to distinct families within the Firmicutes, but members of nine other phyla, especially from Actinobacteria, Bacteroidetes, Fusobacteria, Proteobacteria, Spirochaetes, and Thermotogae, were also identified as potential butyrate producers. The established gene catalogue (3,055 entries) was used to screen for butyrate synthesis pathways in 15 metagenomes derived from stool samples of healthy individuals provided by the HMP (Human Microbiome Project) consortium. A high percentage of total genomes exhibited a butyrate-producing pathway (mean, 19.1%; range, 3.2% to 39.4%), where the acetyl-coenzyme A (CoA) pathway was the most prevalent (mean, 79.7% of all pathways), followed by the lysine pathway (mean, 11.2%). Diversity analysis for the acetyl-CoA pathway showed that the same few firmicute groups associated with several Lachnospiraceae and Ruminococcaceae were dominating in most individuals, whereas the other pathways were associated primarily with Bacteroidetes. IMPORTANCE Microbiome research has revealed new, important roles of our gut microbiota for maintaining health, but an understanding of effects of specific microbial functions on the host is in its infancy, partly because in-depth functional microbial analyses are rare and publicly available databases are often incomplete/misannotated. In this study, we focused on production of butyrate, the main energy source for colonocytes, which plays a critical role in health and disease. We have provided a complete database of genes from major known butyrate-producing pathways, using in-depth genomic analysis of publicly available genomes, filling an important gap to accurately assess the butyrate-producing potential of complex microbial communities from "-omics"-derived data. Furthermore, a reference data set containing the abundance and diversity of butyrate synthesis pathways from the healthy gut microbiota was established through a metagenomics-based assessment. This study will help in understanding the role of butyrate producers in health and disease and may assist the development of treatments for functional dysbiosis.


Subject(s)
Bacteria/genetics , Bacteria/metabolism , Butyrates/metabolism , Feces/microbiology , Metabolic Networks and Pathways/genetics , Metagenomics , Microbiota , Healthy Volunteers , Humans
3.
PLoS One ; 9(7): e101271, 2014.
Article in English | MEDLINE | ID: mdl-25062443

ABSTRACT

K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer.


Subject(s)
Computational Biology , Nucleotides , Sequence Analysis, DNA , Software , Algorithms , Humans
SELECTION OF CITATIONS
SEARCH DETAIL