Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Sci Data ; 4: 170035, 2017 04 11.
Article in English | MEDLINE | ID: mdl-28398290

ABSTRACT

Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn's disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.


Subject(s)
Databases, Genetic , Gastrointestinal Microbiome , Metabolic Networks and Pathways , Crohn Disease/microbiology , Diabetes Mellitus, Type 2/microbiology , Geography, Medical , Humans , Inflammatory Bowel Diseases/microbiology , Metagenome , Metagenomics
2.
Bioinformatics ; 32(23): 3535-3542, 2016 12 01.
Article in English | MEDLINE | ID: mdl-27515739

ABSTRACT

MOTIVATION: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for more accurate taxonomic and functional annotation in communities of microorganisms, and assist in identifying mobile genetic elements or lateral gene transfer events. RESULTS: Here, we present a statistic called LCA* inspired by Information and Voting theories that uses the NCBI Taxonomic Database hierarchy to assign taxonomy to contigs assembled from environmental sequence information. The LCA* algorithm identifies a sufficiently strong majority on the hierarchy while minimizing entropy changes to the observed taxonomic distribution resulting in improved statistical properties. Moreover, we apply results from the order-statistic literature to formulate a likelihood-ratio hypothesis test and P-value for testing the supremacy of the assigned LCA* taxonomy. Using simulated and real-world datasets, we empirically demonstrate that voting-based methods, majority vote and LCA*, in the presence of known reference annotations, are consistently more accurate in identifying contig taxonomy than the lowest common ancestor algorithm popularized by MEGAN, and that LCA* taxonomy strikes a balance between specificity and confidence to provide an estimate appropriate to the available information in the data. AVAILABILITY AND IMPLEMENTATION: The LCA* has been implemented as a stand-alone Python library compatible with the MetaPathways pipeline; both of which are available on GitHub with installation instructions and use-cases (http://www.github.com/hallamlab/LCAStar/). CONTACT: shallam@mail.ubc.caSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Metagenome , Phylogeny , Entropy , Models, Statistical
3.
Curr Opin Microbiol ; 31: 209-216, 2016 06.
Article in English | MEDLINE | ID: mdl-27183115

ABSTRACT

A revolution is unfolding in microbial ecology where petabytes of 'multi-omics' data are produced using next generation sequencing and mass spectrometry platforms. This cornucopia of biological information has enormous potential to reveal the hidden metabolic powers of microbial communities in natural and engineered ecosystems. However, to realize this potential, the development of new technologies and interpretative frameworks grounded in ecological design principles are needed to overcome computational and analytical bottlenecks. Here we explore the relationship between microbial ecology and information science in the era of cloud-based computation. We consider microorganisms as individual information processing units implementing a distributed metabolic algorithm and describe developments in ecoinformatics and ubiquitous computing with the potential to eliminate bottlenecks and empower knowledge creation and translation.


Subject(s)
Ecological and Environmental Phenomena , Electronic Data Processing/methods , Information Science/methods , Information Services , Microbial Consortia/genetics , Ecosystem , High-Throughput Nucleotide Sequencing , Internet
4.
Bioinformatics ; 31(20): 3345-7, 2015 Oct 15.
Article in English | MEDLINE | ID: mdl-26076725

ABSTRACT

UNLABELLED: Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation. We specifically address pathway prediction hazards through integration of a weighted taxonomic distance and enable quantitative comparison of assembled annotations through a normalized read-mapping measure. Additionally, we improve LAST homology searches through BLAST-equivalent E-values and output formats that are natively compatible with prevailing software applications. Finally, an updated graphical user interface allows for keyword annotation query and projection onto user-defined functional gene hierarchies, including the Carbohydrate-Active Enzyme database. AVAILABILITY AND IMPLEMENTATION: MetaPathways v2.5 is available on GitHub: http://github.com/hallamlab/metapathways2. CONTACT: shallam@mail.ubc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Information Storage and Retrieval , Molecular Sequence Annotation , Phylogeny , Software , Algorithms , Databases, Genetic , Humans , Sequence Analysis, DNA/methods
5.
Environ Microbiol ; 17(12): 4979-93, 2015 Dec.
Article in English | MEDLINE | ID: mdl-25857222

ABSTRACT

Enhanced biological phosphorus removal (EBPR) relies on diverse but specialized microbial communities to mediate the cycling and ultimate removal of phosphorus from municipal wastewaters. However, little is known about microbial activity and dynamics in relation to process fluctuations in EBPR ecosystems. Here, we monitored temporal changes in microbial community structure and potential activity across each bioreactor zone in a pilot-scale EBPR treatment plant by examining the ratio of small subunit ribosomal RNA (SSU rRNA) to SSU rRNA gene (rDNA) over a 120 day study period. Although the majority of operational taxonomic units (OTUs) in the EBPR ecosystem were rare, many maintained high potential activities based on SSU rRNA : rDNA ratios, suggesting that rare OTUs contribute substantially to protein synthesis potential in EBPR ecosystems. Few significant differences in OTU abundance and activity were observed between bioreactor redox zones, although differences in temporal activity were observed among phylogenetically cohesive OTUs. Moreover, observed temporal activity patterns could not be explained by measured process parameters, suggesting that other ecological drivers, such as grazing or viral lysis, modulated community interactions. Taken together, these results point towards complex interactions selected for within the EBPR ecosystem and highlight a previously unrecognized functional potential among low abundance microorganisms in engineered ecosystems.


Subject(s)
Bacteria/classification , DNA, Ribosomal/genetics , Phosphorus/metabolism , RNA, Ribosomal/genetics , Water Pollutants, Chemical/metabolism , Bacteria/genetics , Bacteria/isolation & purification , Biodegradation, Environmental , Bioreactors/microbiology , Ecosystem , Phylogeny , Wastewater
6.
BMC Genomics ; 15: 619, 2014 Jul 22.
Article in English | MEDLINE | ID: mdl-25048541

ABSTRACT

BACKGROUND: A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. RESULTS: Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. CONCLUSIONS: This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.


Subject(s)
Genomics/methods , Metabolic Networks and Pathways , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation
7.
ISME J ; 8(2): 455-68, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24030600

ABSTRACT

Marine Group A (MGA) is a deeply branching and uncultivated phylum of bacteria. Although their functional roles remain elusive, MGA subgroups are particularly abundant and diverse in oxygen minimum zones and permanent or seasonally stratified anoxic basins, suggesting metabolic adaptation to oxygen-deficiency. Here, we expand a previous survey of MGA diversity in O2-deficient waters of the Northeast subarctic Pacific Ocean (NESAP) to include Saanich Inlet (SI), an anoxic fjord with seasonal O2 gradients and periodic sulfide accumulation. Phylogenetic analysis of small subunit ribosomal RNA (16S rRNA) gene clone libraries recovered five previously described MGA subgroups and defined three novel subgroups (SHBH1141, SHBH391, and SHAN400) in SI. To discern the functional properties of MGA residing along gradients of O2 in the NESAP and SI, we identified and sequenced to completion 14 fosmids harboring MGA-associated 16S RNA genes from a collection of 46 fosmid libraries sourced from NESAP and SI waters. Comparative analysis of these fosmids, in addition to four publicly available MGA-associated large-insert DNA fragments from Hawaii Ocean Time-series and Monterey Bay, revealed widespread genomic differentiation proximal to the ribosomal RNA operon that did not consistently reflect subgroup partitioning patterns observed in 16S rRNA gene clone libraries. Predicted protein-coding genes associated with adaptation to O2-deficiency and sulfur-based energy metabolism were detected on multiple fosmids, including polysulfide reductase (psrABC), implicated in dissimilatory polysulfide reduction to hydrogen sulfide and dissimilatory sulfur oxidation. These results posit a potential role for specific MGA subgroups in the marine sulfur cycle.


Subject(s)
Bacteria/classification , Bacteria/genetics , Biodiversity , Phylogeny , Aquatic Organisms/classification , Aquatic Organisms/genetics , Aquatic Organisms/metabolism , Bacteria/metabolism , Genome, Bacterial/genetics , Genomics , Molecular Sequence Data , Oxygen/analysis , Pacific Ocean , RNA, Ribosomal, 16S/genetics , Seawater/chemistry
8.
Environ Sci Technol ; 47(18): 10708-17, 2013 Sep 17.
Article in English | MEDLINE | ID: mdl-23889694

ABSTRACT

Oil in subsurface reservoirs is biodegraded by resident microbial communities. Water-mediated, anaerobic conversion of hydrocarbons to methane and CO2, catalyzed by syntrophic bacteria and methanogenic archaea, is thought to be one of the dominant processes. We compared 160 microbial community compositions in ten hydrocarbon resource environments (HREs) and sequenced twelve metagenomes to characterize their metabolic potential. Although anaerobic communities were common, cores from oil sands and coal beds had unexpectedly high proportions of aerobic hydrocarbon-degrading bacteria. Likewise, most metagenomes had high proportions of genes for enzymes involved in aerobic hydrocarbon metabolism. Hence, although HREs may have been strictly anaerobic and typically methanogenic for much of their history, this may not hold today for coal beds and for the Alberta oil sands, one of the largest remaining oil reservoirs in the world. This finding may influence strategies to recover energy or chemicals from these HREs by in situ microbial processes.


Subject(s)
Archaea/genetics , Bacteria/genetics , Oil and Gas Fields/microbiology , RNA, Archaeal/genetics , Aerobiosis , Alberta , Archaea/classification , Archaea/metabolism , Bacteria/classification , Bacteria/metabolism , Genes, Archaeal , Genes, Bacterial , Hydrocarbons/metabolism , Metagenomics , RNA, Archaeal/metabolism , RNA, Bacterial/genetics , RNA, Ribosomal, 16S/genetics
9.
Proc Natl Acad Sci U S A ; 110(28): 11463-8, 2013 Jul 09.
Article in English | MEDLINE | ID: mdl-23801761

ABSTRACT

Planktonic bacteria dominate surface ocean biomass and influence global biogeochemical processes, but remain poorly characterized owing to difficulties in cultivation. Using large-scale single cell genomics, we obtained insight into the genome content and biogeography of many bacterial lineages inhabiting the surface ocean. We found that, compared with existing cultures, natural bacterioplankton have smaller genomes, fewer gene duplications, and are depleted in guanine and cytosine, noncoding nucleotides, and genes encoding transcription, signal transduction, and noncytoplasmic proteins. These findings provide strong evidence that genome streamlining and oligotrophy are prevalent features among diverse, free-living bacterioplankton, whereas existing laboratory cultures consist primarily of copiotrophs. The apparent ubiquity of metabolic specialization and mixotrophy, as predicted from single cell genomes, also may contribute to the difficulty in bacterioplankton cultivation. Using metagenome fragment recruitment against single cell genomes, we show that the global distribution of surface ocean bacterioplankton correlates with temperature and latitude and is not limited by dispersal at the time scales required for nucleotide substitution to exceed the current operational definition of bacterial species. Single cell genomes with highly similar small subunit rRNA gene sequences exhibited significant genomic and biogeographic variability, highlighting challenges in the interpretation of individual gene surveys and metagenome assemblies in environmental microbiology. Our study demonstrates the utility of single cell genomics for gaining an improved understanding of the composition and dynamics of natural microbial assemblages.


Subject(s)
Bacteria/classification , Genome, Bacterial , Marine Biology , Plankton/classification , Water Microbiology , Bacteria/genetics , Geography , Oceans and Seas , Plankton/genetics
10.
BMC Bioinformatics ; 14: 202, 2013 Jun 21.
Article in English | MEDLINE | ID: mdl-23800136

ABSTRACT

BACKGROUND: A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems. RESULTS: Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons. CONCLUSIONS: MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package, installation instructions, and example data can be obtained from http://hallam.microbiology.ubc.ca/MetaPathways.


Subject(s)
Databases, Genetic , Environment , Software , Algorithms , Animals , Databases, Nucleic Acid , Ecosystem , Forecasting , Genomics , Humans , Phylogeny
11.
BMC Genomics ; 14 Suppl 1: S3, 2013.
Article in English | MEDLINE | ID: mdl-23368516

ABSTRACT

BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. RESULTS: To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm²n) to O(m²n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. CONCLUSIONS: The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.


Subject(s)
Software , Algorithms , Computational Biology , Female , Humans , Internet , Intestines/microbiology , Male , Metagenome , Mouth/microbiology , Saccharomyces cerevisiae/genetics , Skin/microbiology , User-Computer Interface
12.
Proc Natl Acad Sci U S A ; 109(20): 7665-70, 2012 May 15.
Article in English | MEDLINE | ID: mdl-22547789

ABSTRACT

We present a programmable droplet-based microfluidic device that combines the reconfigurable flow-routing capabilities of integrated microvalve technology with the sample compartmentalization and dispersion-free transport that is inherent to droplets. The device allows for the execution of user-defined multistep reaction protocols in 95 individually addressable nanoliter-volume storage chambers by consecutively merging programmable sequences of picoliter-volume droplets containing reagents or cells. This functionality is enabled by "flow-controlled wetting," a droplet docking and merging mechanism that exploits the physics of droplet flow through a channel to control the precise location of droplet wetting. The device also allows for automated cross-contamination-free recovery of reaction products from individual chambers into standard microfuge tubes for downstream analysis. The combined features of programmability, addressability, and selective recovery provide a general hardware platform that can be reprogrammed for multiple applications. We demonstrate this versatility by implementing multiple single-cell experiment types with this device: bacterial cell sorting and cultivation, taxonomic gene identification, and high-throughput single-cell whole genome amplification and sequencing using common laboratory strains. Finally, we apply the device to genome analysis of single cells and microbial consortia from diverse environmental samples including a marine enrichment culture, deep-sea sediments, and the human oral cavity. The resulting datasets capture genotypic properties of individual cells and illuminate known and potentially unique partnerships between microbial community members.


Subject(s)
Hydrodynamics , Metagenome/genetics , Microfluidic Analytical Techniques/instrumentation , Microfluidic Analytical Techniques/methods , Base Sequence , DNA Primers/genetics , Genotype , Geologic Sediments/microbiology , Humans , Image Processing, Computer-Assisted , Metagenomics/methods , Microscopy, Fluorescence , Molecular Sequence Data , Mouth/microbiology , Polymerase Chain Reaction , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA , Surface-Active Agents , Wettability
SELECTION OF CITATIONS
SEARCH DETAIL