Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Bioinformatics ; 32(23): 3535-3542, 2016 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-27515739

RESUMO

MOTIVATION: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for more accurate taxonomic and functional annotation in communities of microorganisms, and assist in identifying mobile genetic elements or lateral gene transfer events. RESULTS: Here, we present a statistic called LCA* inspired by Information and Voting theories that uses the NCBI Taxonomic Database hierarchy to assign taxonomy to contigs assembled from environmental sequence information. The LCA* algorithm identifies a sufficiently strong majority on the hierarchy while minimizing entropy changes to the observed taxonomic distribution resulting in improved statistical properties. Moreover, we apply results from the order-statistic literature to formulate a likelihood-ratio hypothesis test and P-value for testing the supremacy of the assigned LCA* taxonomy. Using simulated and real-world datasets, we empirically demonstrate that voting-based methods, majority vote and LCA*, in the presence of known reference annotations, are consistently more accurate in identifying contig taxonomy than the lowest common ancestor algorithm popularized by MEGAN, and that LCA* taxonomy strikes a balance between specificity and confidence to provide an estimate appropriate to the available information in the data. AVAILABILITY AND IMPLEMENTATION: The LCA* has been implemented as a stand-alone Python library compatible with the MetaPathways pipeline; both of which are available on GitHub with installation instructions and use-cases (http://www.github.com/hallamlab/LCAStar/). CONTACT: shallam@mail.ubc.caSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Metagenoma , Filogenia , Entropia , Modelos Estatísticos
2.
Bioinformatics ; 31(20): 3345-7, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26076725

RESUMO

UNLABELLED: Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation. We specifically address pathway prediction hazards through integration of a weighted taxonomic distance and enable quantitative comparison of assembled annotations through a normalized read-mapping measure. Additionally, we improve LAST homology searches through BLAST-equivalent E-values and output formats that are natively compatible with prevailing software applications. Finally, an updated graphical user interface allows for keyword annotation query and projection onto user-defined functional gene hierarchies, including the Carbohydrate-Active Enzyme database. AVAILABILITY AND IMPLEMENTATION: MetaPathways v2.5 is available on GitHub: http://github.com/hallamlab/metapathways2. CONTACT: shallam@mail.ubc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Armazenamento e Recuperação da Informação , Anotação de Sequência Molecular , Filogenia , Software , Algoritmos , Bases de Dados Genéticas , Humanos , Análise de Sequência de DNA/métodos
3.
Proc Natl Acad Sci U S A ; 110(28): 11463-8, 2013 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-23801761

RESUMO

Planktonic bacteria dominate surface ocean biomass and influence global biogeochemical processes, but remain poorly characterized owing to difficulties in cultivation. Using large-scale single cell genomics, we obtained insight into the genome content and biogeography of many bacterial lineages inhabiting the surface ocean. We found that, compared with existing cultures, natural bacterioplankton have smaller genomes, fewer gene duplications, and are depleted in guanine and cytosine, noncoding nucleotides, and genes encoding transcription, signal transduction, and noncytoplasmic proteins. These findings provide strong evidence that genome streamlining and oligotrophy are prevalent features among diverse, free-living bacterioplankton, whereas existing laboratory cultures consist primarily of copiotrophs. The apparent ubiquity of metabolic specialization and mixotrophy, as predicted from single cell genomes, also may contribute to the difficulty in bacterioplankton cultivation. Using metagenome fragment recruitment against single cell genomes, we show that the global distribution of surface ocean bacterioplankton correlates with temperature and latitude and is not limited by dispersal at the time scales required for nucleotide substitution to exceed the current operational definition of bacterial species. Single cell genomes with highly similar small subunit rRNA gene sequences exhibited significant genomic and biogeographic variability, highlighting challenges in the interpretation of individual gene surveys and metagenome assemblies in environmental microbiology. Our study demonstrates the utility of single cell genomics for gaining an improved understanding of the composition and dynamics of natural microbial assemblages.


Assuntos
Bactérias/classificação , Genoma Bacteriano , Biologia Marinha , Plâncton/classificação , Microbiologia da Água , Bactérias/genética , Geografia , Oceanos e Mares , Plâncton/genética
4.
Environ Microbiol ; 17(12): 4979-93, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25857222

RESUMO

Enhanced biological phosphorus removal (EBPR) relies on diverse but specialized microbial communities to mediate the cycling and ultimate removal of phosphorus from municipal wastewaters. However, little is known about microbial activity and dynamics in relation to process fluctuations in EBPR ecosystems. Here, we monitored temporal changes in microbial community structure and potential activity across each bioreactor zone in a pilot-scale EBPR treatment plant by examining the ratio of small subunit ribosomal RNA (SSU rRNA) to SSU rRNA gene (rDNA) over a 120 day study period. Although the majority of operational taxonomic units (OTUs) in the EBPR ecosystem were rare, many maintained high potential activities based on SSU rRNA : rDNA ratios, suggesting that rare OTUs contribute substantially to protein synthesis potential in EBPR ecosystems. Few significant differences in OTU abundance and activity were observed between bioreactor redox zones, although differences in temporal activity were observed among phylogenetically cohesive OTUs. Moreover, observed temporal activity patterns could not be explained by measured process parameters, suggesting that other ecological drivers, such as grazing or viral lysis, modulated community interactions. Taken together, these results point towards complex interactions selected for within the EBPR ecosystem and highlight a previously unrecognized functional potential among low abundance microorganisms in engineered ecosystems.


Assuntos
Bactérias/classificação , DNA Ribossômico/genética , Fósforo/metabolismo , RNA Ribossômico/genética , Poluentes Químicos da Água/metabolismo , Bactérias/genética , Bactérias/isolamento & purificação , Biodegradação Ambiental , Reatores Biológicos/microbiologia , Ecossistema , Filogenia , Águas Residuárias
5.
Proc Natl Acad Sci U S A ; 109(20): 7665-70, 2012 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-22547789

RESUMO

We present a programmable droplet-based microfluidic device that combines the reconfigurable flow-routing capabilities of integrated microvalve technology with the sample compartmentalization and dispersion-free transport that is inherent to droplets. The device allows for the execution of user-defined multistep reaction protocols in 95 individually addressable nanoliter-volume storage chambers by consecutively merging programmable sequences of picoliter-volume droplets containing reagents or cells. This functionality is enabled by "flow-controlled wetting," a droplet docking and merging mechanism that exploits the physics of droplet flow through a channel to control the precise location of droplet wetting. The device also allows for automated cross-contamination-free recovery of reaction products from individual chambers into standard microfuge tubes for downstream analysis. The combined features of programmability, addressability, and selective recovery provide a general hardware platform that can be reprogrammed for multiple applications. We demonstrate this versatility by implementing multiple single-cell experiment types with this device: bacterial cell sorting and cultivation, taxonomic gene identification, and high-throughput single-cell whole genome amplification and sequencing using common laboratory strains. Finally, we apply the device to genome analysis of single cells and microbial consortia from diverse environmental samples including a marine enrichment culture, deep-sea sediments, and the human oral cavity. The resulting datasets capture genotypic properties of individual cells and illuminate known and potentially unique partnerships between microbial community members.


Assuntos
Hidrodinâmica , Metagenoma/genética , Técnicas Analíticas Microfluídicas/instrumentação , Técnicas Analíticas Microfluídicas/métodos , Sequência de Bases , Primers do DNA/genética , Genótipo , Sedimentos Geológicos/microbiologia , Humanos , Processamento de Imagem Assistida por Computador , Metagenômica/métodos , Microscopia de Fluorescência , Dados de Sequência Molecular , Boca/microbiologia , Reação em Cadeia da Polimerase , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Tensoativos , Molhabilidade
6.
BMC Genomics ; 15: 619, 2014 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-25048541

RESUMO

BACKGROUND: A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. RESULTS: Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. CONCLUSIONS: This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.


Assuntos
Genômica/métodos , Redes e Vias Metabólicas , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular
7.
BMC Bioinformatics ; 14: 202, 2013 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-23800136

RESUMO

BACKGROUND: A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems. RESULTS: Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons. CONCLUSIONS: MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package, installation instructions, and example data can be obtained from http://hallam.microbiology.ubc.ca/MetaPathways.


Assuntos
Bases de Dados Genéticas , Meio Ambiente , Software , Algoritmos , Animais , Bases de Dados de Ácidos Nucleicos , Ecossistema , Previsões , Genômica , Humanos , Filogenia
8.
BMC Genomics ; 14 Suppl 1: S3, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23368516

RESUMO

BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. RESULTS: To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm²n) to O(m²n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. CONCLUSIONS: The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.


Assuntos
Software , Algoritmos , Biologia Computacional , Feminino , Humanos , Internet , Intestinos/microbiologia , Masculino , Metagenoma , Boca/microbiologia , Saccharomyces cerevisiae/genética , Pele/microbiologia , Interface Usuário-Computador
9.
Environ Sci Technol ; 47(18): 10708-17, 2013 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-23889694

RESUMO

Oil in subsurface reservoirs is biodegraded by resident microbial communities. Water-mediated, anaerobic conversion of hydrocarbons to methane and CO2, catalyzed by syntrophic bacteria and methanogenic archaea, is thought to be one of the dominant processes. We compared 160 microbial community compositions in ten hydrocarbon resource environments (HREs) and sequenced twelve metagenomes to characterize their metabolic potential. Although anaerobic communities were common, cores from oil sands and coal beds had unexpectedly high proportions of aerobic hydrocarbon-degrading bacteria. Likewise, most metagenomes had high proportions of genes for enzymes involved in aerobic hydrocarbon metabolism. Hence, although HREs may have been strictly anaerobic and typically methanogenic for much of their history, this may not hold today for coal beds and for the Alberta oil sands, one of the largest remaining oil reservoirs in the world. This finding may influence strategies to recover energy or chemicals from these HREs by in situ microbial processes.


Assuntos
Archaea/genética , Bactérias/genética , Campos de Petróleo e Gás/microbiologia , RNA Arqueal/genética , Aerobiose , Alberta , Archaea/classificação , Archaea/metabolismo , Bactérias/classificação , Bactérias/metabolismo , Genes Arqueais , Genes Bacterianos , Hidrocarbonetos/metabolismo , Metagenômica , RNA Arqueal/metabolismo , RNA Bacteriano/genética , RNA Ribossômico 16S/genética
10.
Sci Data ; 4: 170035, 2017 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-28398290

RESUMO

Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn's disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.


Assuntos
Bases de Dados Genéticas , Microbioma Gastrointestinal , Redes e Vias Metabólicas , Doença de Crohn/microbiologia , Diabetes Mellitus Tipo 2/microbiologia , Geografia Médica , Humanos , Doenças Inflamatórias Intestinais/microbiologia , Metagenoma , Metagenômica
11.
Curr Opin Microbiol ; 31: 209-216, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27183115

RESUMO

A revolution is unfolding in microbial ecology where petabytes of 'multi-omics' data are produced using next generation sequencing and mass spectrometry platforms. This cornucopia of biological information has enormous potential to reveal the hidden metabolic powers of microbial communities in natural and engineered ecosystems. However, to realize this potential, the development of new technologies and interpretative frameworks grounded in ecological design principles are needed to overcome computational and analytical bottlenecks. Here we explore the relationship between microbial ecology and information science in the era of cloud-based computation. We consider microorganisms as individual information processing units implementing a distributed metabolic algorithm and describe developments in ecoinformatics and ubiquitous computing with the potential to eliminate bottlenecks and empower knowledge creation and translation.


Assuntos
Fenômenos Ecológicos e Ambientais , Processamento Eletrônico de Dados/métodos , Ciência da Informação/métodos , Serviços de Informação , Consórcios Microbianos/genética , Ecossistema , Sequenciamento de Nucleotídeos em Larga Escala , Internet
12.
ISME J ; 8(2): 455-68, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24030600

RESUMO

Marine Group A (MGA) is a deeply branching and uncultivated phylum of bacteria. Although their functional roles remain elusive, MGA subgroups are particularly abundant and diverse in oxygen minimum zones and permanent or seasonally stratified anoxic basins, suggesting metabolic adaptation to oxygen-deficiency. Here, we expand a previous survey of MGA diversity in O2-deficient waters of the Northeast subarctic Pacific Ocean (NESAP) to include Saanich Inlet (SI), an anoxic fjord with seasonal O2 gradients and periodic sulfide accumulation. Phylogenetic analysis of small subunit ribosomal RNA (16S rRNA) gene clone libraries recovered five previously described MGA subgroups and defined three novel subgroups (SHBH1141, SHBH391, and SHAN400) in SI. To discern the functional properties of MGA residing along gradients of O2 in the NESAP and SI, we identified and sequenced to completion 14 fosmids harboring MGA-associated 16S RNA genes from a collection of 46 fosmid libraries sourced from NESAP and SI waters. Comparative analysis of these fosmids, in addition to four publicly available MGA-associated large-insert DNA fragments from Hawaii Ocean Time-series and Monterey Bay, revealed widespread genomic differentiation proximal to the ribosomal RNA operon that did not consistently reflect subgroup partitioning patterns observed in 16S rRNA gene clone libraries. Predicted protein-coding genes associated with adaptation to O2-deficiency and sulfur-based energy metabolism were detected on multiple fosmids, including polysulfide reductase (psrABC), implicated in dissimilatory polysulfide reduction to hydrogen sulfide and dissimilatory sulfur oxidation. These results posit a potential role for specific MGA subgroups in the marine sulfur cycle.


Assuntos
Bactérias/classificação , Bactérias/genética , Biodiversidade , Filogenia , Organismos Aquáticos/classificação , Organismos Aquáticos/genética , Organismos Aquáticos/metabolismo , Bactérias/metabolismo , Genoma Bacteriano/genética , Genômica , Dados de Sequência Molecular , Oxigênio/análise , Oceano Pacífico , RNA Ribossômico 16S/genética , Água do Mar/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA