RESUMO
MOTIVATION: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for more accurate taxonomic and functional annotation in communities of microorganisms, and assist in identifying mobile genetic elements or lateral gene transfer events. RESULTS: Here, we present a statistic called LCA* inspired by Information and Voting theories that uses the NCBI Taxonomic Database hierarchy to assign taxonomy to contigs assembled from environmental sequence information. The LCA* algorithm identifies a sufficiently strong majority on the hierarchy while minimizing entropy changes to the observed taxonomic distribution resulting in improved statistical properties. Moreover, we apply results from the order-statistic literature to formulate a likelihood-ratio hypothesis test and P-value for testing the supremacy of the assigned LCA* taxonomy. Using simulated and real-world datasets, we empirically demonstrate that voting-based methods, majority vote and LCA*, in the presence of known reference annotations, are consistently more accurate in identifying contig taxonomy than the lowest common ancestor algorithm popularized by MEGAN, and that LCA* taxonomy strikes a balance between specificity and confidence to provide an estimate appropriate to the available information in the data. AVAILABILITY AND IMPLEMENTATION: The LCA* has been implemented as a stand-alone Python library compatible with the MetaPathways pipeline; both of which are available on GitHub with installation instructions and use-cases (http://www.github.com/hallamlab/LCAStar/). CONTACT: shallam@mail.ubc.caSupplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Metagenoma , Filogenia , Entropia , Modelos EstatísticosRESUMO
UNLABELLED: Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation. We specifically address pathway prediction hazards through integration of a weighted taxonomic distance and enable quantitative comparison of assembled annotations through a normalized read-mapping measure. Additionally, we improve LAST homology searches through BLAST-equivalent E-values and output formats that are natively compatible with prevailing software applications. Finally, an updated graphical user interface allows for keyword annotation query and projection onto user-defined functional gene hierarchies, including the Carbohydrate-Active Enzyme database. AVAILABILITY AND IMPLEMENTATION: MetaPathways v2.5 is available on GitHub: http://github.com/hallamlab/metapathways2. CONTACT: shallam@mail.ubc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Armazenamento e Recuperação da Informação , Anotação de Sequência Molecular , Filogenia , Software , Algoritmos , Bases de Dados Genéticas , Humanos , Análise de Sequência de DNA/métodosRESUMO
We present a programmable droplet-based microfluidic device that combines the reconfigurable flow-routing capabilities of integrated microvalve technology with the sample compartmentalization and dispersion-free transport that is inherent to droplets. The device allows for the execution of user-defined multistep reaction protocols in 95 individually addressable nanoliter-volume storage chambers by consecutively merging programmable sequences of picoliter-volume droplets containing reagents or cells. This functionality is enabled by "flow-controlled wetting," a droplet docking and merging mechanism that exploits the physics of droplet flow through a channel to control the precise location of droplet wetting. The device also allows for automated cross-contamination-free recovery of reaction products from individual chambers into standard microfuge tubes for downstream analysis. The combined features of programmability, addressability, and selective recovery provide a general hardware platform that can be reprogrammed for multiple applications. We demonstrate this versatility by implementing multiple single-cell experiment types with this device: bacterial cell sorting and cultivation, taxonomic gene identification, and high-throughput single-cell whole genome amplification and sequencing using common laboratory strains. Finally, we apply the device to genome analysis of single cells and microbial consortia from diverse environmental samples including a marine enrichment culture, deep-sea sediments, and the human oral cavity. The resulting datasets capture genotypic properties of individual cells and illuminate known and potentially unique partnerships between microbial community members.
Assuntos
Hidrodinâmica , Metagenoma/genética , Técnicas Analíticas Microfluídicas/instrumentação , Técnicas Analíticas Microfluídicas/métodos , Sequência de Bases , Primers do DNA/genética , Genótipo , Sedimentos Geológicos/microbiologia , Humanos , Processamento de Imagem Assistida por Computador , Metagenômica/métodos , Microscopia de Fluorescência , Dados de Sequência Molecular , Boca/microbiologia , Reação em Cadeia da Polimerase , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Tensoativos , MolhabilidadeRESUMO
BACKGROUND: A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. RESULTS: Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. CONCLUSIONS: This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.
Assuntos
Genômica/métodos , Redes e Vias Metabólicas , Redes e Vias Metabólicas/genética , Anotação de Sequência MolecularRESUMO
Despite recent advances in metagenomic and single-cell genomic sequencing to investigate uncultivated microbial diversity and metabolic potential, fundamental questions related to population structure, interactions, and biogeochemical roles of candidate divisions remain. Numerous molecular surveys suggest that stratified ecosystems manifesting anoxic, sulfidic, and/or methane-rich conditions are enriched in these enigmatic microbes. Here we describe diversity, abundance, and cooccurrence patterns of uncultivated microbial communities inhabiting the permanently stratified waters of meromictic Sakinaw Lake, British Columbia, Canada, using 454 sequencing of the small-subunit rRNA gene with three-domain resolution. Operational taxonomic units (OTUs) were affiliated with 64 phyla, including more than 25 candidate divisions. Pronounced trends in community structure were observed for all three domains with eukaryotic sequences vanishing almost completely below the mixolimnion, followed by a rapid and sustained increase in methanogen-affiliated (â¼10%) and unassigned (â¼60%) archaeal sequences as well as bacterial OTUs affiliated with Chloroflexi (â¼22%) and candidate divisions (â¼28%). Network analysis revealed highly correlated, depth-dependent cooccurrence patterns between Chloroflexi, candidate divisions WWE1, OP9/JS1, OP8, and OD1, methanogens, and unassigned archaeal OTUs indicating niche partitioning and putative syntrophic growth modes. Indeed, pathway reconstruction using recently published Sakinaw Lake single-cell genomes affiliated with OP9/JS1 and OP8 revealed complete coverage of the Wood-Ljungdahl pathway with potential to drive syntrophic acetate oxidation to hydrogen and carbon dioxide under methanogenic conditions. Taken together, these observations point to previously unrecognized syntrophic networks in meromictic lake ecosystems with the potential to inform design and operation of anaerobic methanogenic bioreactors.
Assuntos
Archaea/classificação , Bactérias/classificação , Biota , Eucariotos/classificação , Lagos/microbiologia , Archaea/genética , Bactérias/genética , Colúmbia Britânica , Análise por Conglomerados , DNA Ribossômico/química , DNA Ribossômico/genética , Eucariotos/genética , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNARESUMO
The reconstruction of complete microbial metabolic pathways using 'omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets. Using gene annotation data and information from the KEGG module database, MetaPathPredict employs deep learning models to predict the presence of KEGG modules in a genome. MetaPathPredict can be used as a command line tool or as a Python module, and both options are designed to be run locally or on a compute cluster. Benchmarks show that MetaPathPredict makes robust predictions of KEGG module presence within highly incomplete genomes.
Assuntos
Genoma Bacteriano , Redes e Vias Metabólicas , Software , Redes e Vias Metabólicas/genética , Biologia Computacional/métodos , Aprendizado de Máquina , Bactérias/genética , Bactérias/metabolismo , Bactérias/classificaçãoRESUMO
BACKGROUND: A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems. RESULTS: Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons. CONCLUSIONS: MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package, installation instructions, and example data can be obtained from http://hallam.microbiology.ubc.ca/MetaPathways.
Assuntos
Bases de Dados Genéticas , Meio Ambiente , Software , Algoritmos , Animais , Bases de Dados de Ácidos Nucleicos , Ecossistema , Previsões , Genômica , Humanos , FilogeniaRESUMO
BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. RESULTS: To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm²n) to O(m²n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. CONCLUSIONS: The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.
Assuntos
Software , Algoritmos , Biologia Computacional , Feminino , Humanos , Internet , Intestinos/microbiologia , Masculino , Metagenoma , Boca/microbiologia , Saccharomyces cerevisiae/genética , Pele/microbiologia , Interface Usuário-ComputadorRESUMO
Oil in subsurface reservoirs is biodegraded by resident microbial communities. Water-mediated, anaerobic conversion of hydrocarbons to methane and CO2, catalyzed by syntrophic bacteria and methanogenic archaea, is thought to be one of the dominant processes. We compared 160 microbial community compositions in ten hydrocarbon resource environments (HREs) and sequenced twelve metagenomes to characterize their metabolic potential. Although anaerobic communities were common, cores from oil sands and coal beds had unexpectedly high proportions of aerobic hydrocarbon-degrading bacteria. Likewise, most metagenomes had high proportions of genes for enzymes involved in aerobic hydrocarbon metabolism. Hence, although HREs may have been strictly anaerobic and typically methanogenic for much of their history, this may not hold today for coal beds and for the Alberta oil sands, one of the largest remaining oil reservoirs in the world. This finding may influence strategies to recover energy or chemicals from these HREs by in situ microbial processes.
Assuntos
Archaea/genética , Bactérias/genética , Campos de Petróleo e Gás/microbiologia , RNA Arqueal/genética , Aerobiose , Alberta , Archaea/classificação , Archaea/metabolismo , Bactérias/classificação , Bactérias/metabolismo , Genes Arqueais , Genes Bacterianos , Hidrocarbonetos/metabolismo , Metagenômica , RNA Arqueal/metabolismo , RNA Bacteriano/genética , RNA Ribossômico 16S/genéticaRESUMO
Vineyards in wine regions around the world are reservoirs of yeast with oenological potential. Saccharomyces cerevisiae ferments grape sugars to ethanol and generates flavor and aroma compounds in wine. Wineries place a high-value on identifying yeast native to their region to develop a region-specific wine program. Commercial wine strains are genetically very similar due to a population bottleneck and in-breeding compared to the diversity of S. cerevisiae from the wild and other industrial processes. We have isolated and microsatellite-typed hundreds of S. cerevisiae strains from spontaneous fermentations of grapes from the Okanagan Valley wine region in British Columbia, Canada. We chose 75 S. cerevisiae strains, based on our microsatellite clustering data, for whole genome sequencing using Illumina paired-end reads. Phylogenetic analysis shows that British Columbian S. cerevisiae strains cluster into 4 clades: Wine/European, Transpacific Oak, Beer 1/Mixed Origin, and a new clade that we have designated as Pacific West Coast Wine. The Pacific West Coast Wine clade has high nucleotide diversity and shares genomic characteristics with wild North American oak strains but also has gene flow from Wine/European and Ecuadorian clades. We analyzed gene copy number variations to find evidence of domestication and found that strains in the Wine/European and Pacific West Coast Wine clades have gene copy number variation reflective of adaptations to the wine-making environment. The "wine circle/Region B", a cluster of 5 genes acquired by horizontal gene transfer into the genome of commercial wine strains is also present in the majority of the British Columbian strains in the Wine/European clade but in a minority of the Pacific West Coast Wine clade strains. Previous studies have shown that S. cerevisiae strains isolated from Mediterranean Oak trees may be the living ancestors of European wine yeast strains. This study is the first to isolate S. cerevisiae strains with genetic similarity to nonvineyard North American Oak strains from spontaneous wine fermentations.
Assuntos
Saccharomyces cerevisiae , Vinho , Variações do Número de Cópias de DNA , Fermentação , Filogenia , Canadá , Melhoramento Vegetal , Sequenciamento Completo do GenomaRESUMO
Although often neglected in gut microbiota studies, recent evidence suggests that imbalanced, or dysbiotic, gut mycobiota (fungal microbiota) communities in infancy coassociate with states of bacterial dysbiosis linked to inflammatory diseases such as asthma. In the present study, we (i) characterized the infant gut mycobiota at 3 months and 1 year of age in 343 infants from the CHILD Cohort Study, (ii) defined associations among gut mycobiota community composition and environmental factors for the development of inhalant allergic sensitization (atopy) at age 5 years, and (iii) built a predictive model for inhalant atopy status at age 5 years using these data. We show that in Canadian infants, fungal communities shift dramatically in composition over the first year of life. Early-life environmental factors known to affect gut bacterial communities were also associated with differences in gut fungal community alpha diversity, beta diversity, and/or the relative abundance of specific fungal taxa. Moreover, these metrics differed among healthy infants and those who developed inhalant allergic sensitization (atopy) by age 5 years. Using a rationally selected set of early-life environmental factors in combination with fungal community composition at 1 year of age, we developed a machine learning logistic regression model that predicted inhalant atopy status at 5 years of age with 81% accuracy. Together, these data suggest an important role for the infant gut mycobiota in early-life immune development and indicate that early-life behavioral or therapeutic interventions have the potential to modify infant gut fungal communities, with implications for an infant's long-term health. IMPORTANCE Recent evidence suggests an immunomodulatory role for commensal fungi (mycobiota) in the gut, yet little is known about the composition and dynamics of early-life gut fungal communities. In this work, we show for the first time that the composition of the gut mycobiota of Canadian infants changes dramatically over the course of the first year of life, is associated with environmental factors such as geographical location, diet, and season of birth, and can be used in conjunction with knowledge of a small number of key early-life factors to predict inhalant atopy status at age 5 years. Our study highlights the importance of considering fungal communities as indicators or inciters of immune dysfunction preceding the onset of allergic disease and can serve as a benchmark for future studies aiming to examine infant gut fungal communities across birth cohorts.
Assuntos
Meio Ambiente , Fungos/genética , Microbioma Gastrointestinal/genética , Hipersensibilidade/etiologia , Hipersensibilidade/microbiologia , Micobioma/genética , Asma/etiologia , Asma/microbiologia , Pré-Escolar , Estudos de Coortes , Disbiose , Fezes/microbiologia , Feminino , Fungos/classificação , Microbioma Gastrointestinal/fisiologia , Humanos , Hipersensibilidade/complicações , Lactente , Masculino , Micobioma/fisiologiaRESUMO
Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn's disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.
Assuntos
Bases de Dados Genéticas , Microbioma Gastrointestinal , Redes e Vias Metabólicas , Doença de Crohn/microbiologia , Diabetes Mellitus Tipo 2/microbiologia , Geografia Médica , Humanos , Doenças Inflamatórias Intestinais/microbiologia , Metagenoma , MetagenômicaRESUMO
A revolution is unfolding in microbial ecology where petabytes of 'multi-omics' data are produced using next generation sequencing and mass spectrometry platforms. This cornucopia of biological information has enormous potential to reveal the hidden metabolic powers of microbial communities in natural and engineered ecosystems. However, to realize this potential, the development of new technologies and interpretative frameworks grounded in ecological design principles are needed to overcome computational and analytical bottlenecks. Here we explore the relationship between microbial ecology and information science in the era of cloud-based computation. We consider microorganisms as individual information processing units implementing a distributed metabolic algorithm and describe developments in ecoinformatics and ubiquitous computing with the potential to eliminate bottlenecks and empower knowledge creation and translation.
Assuntos
Fenômenos Ecológicos e Ambientais , Processamento Eletrônico de Dados/métodos , Ciência da Informação/métodos , Serviços de Informação , Consórcios Microbianos/genética , Ecossistema , Sequenciamento de Nucleotídeos em Larga Escala , InternetRESUMO
Marine Group A (MGA) is a deeply branching and uncultivated phylum of bacteria. Although their functional roles remain elusive, MGA subgroups are particularly abundant and diverse in oxygen minimum zones and permanent or seasonally stratified anoxic basins, suggesting metabolic adaptation to oxygen-deficiency. Here, we expand a previous survey of MGA diversity in O2-deficient waters of the Northeast subarctic Pacific Ocean (NESAP) to include Saanich Inlet (SI), an anoxic fjord with seasonal O2 gradients and periodic sulfide accumulation. Phylogenetic analysis of small subunit ribosomal RNA (16S rRNA) gene clone libraries recovered five previously described MGA subgroups and defined three novel subgroups (SHBH1141, SHBH391, and SHAN400) in SI. To discern the functional properties of MGA residing along gradients of O2 in the NESAP and SI, we identified and sequenced to completion 14 fosmids harboring MGA-associated 16S RNA genes from a collection of 46 fosmid libraries sourced from NESAP and SI waters. Comparative analysis of these fosmids, in addition to four publicly available MGA-associated large-insert DNA fragments from Hawaii Ocean Time-series and Monterey Bay, revealed widespread genomic differentiation proximal to the ribosomal RNA operon that did not consistently reflect subgroup partitioning patterns observed in 16S rRNA gene clone libraries. Predicted protein-coding genes associated with adaptation to O2-deficiency and sulfur-based energy metabolism were detected on multiple fosmids, including polysulfide reductase (psrABC), implicated in dissimilatory polysulfide reduction to hydrogen sulfide and dissimilatory sulfur oxidation. These results posit a potential role for specific MGA subgroups in the marine sulfur cycle.
Assuntos
Bactérias/classificação , Bactérias/genética , Biodiversidade , Filogenia , Organismos Aquáticos/classificação , Organismos Aquáticos/genética , Organismos Aquáticos/metabolismo , Bactérias/metabolismo , Genoma Bacteriano/genética , Genômica , Dados de Sequência Molecular , Oxigênio/análise , Oceano Pacífico , RNA Ribossômico 16S/genética , Água do Mar/químicaRESUMO
Marine Group A (MGA) is a candidate phylum of Bacteria that is ubiquitous and abundant in the ocean. Despite being prevalent, the structural and functional properties of MGA populations remain poorly constrained. Here, we quantified MGA diversity and population structure in relation to nutrients and O(2) concentrations in the oxygen minimum zone (OMZ) of the Northeast subarctic Pacific Ocean using a combination of catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH) and 16S small subunit ribosomal RNA (16S rRNA) gene sequencing (clone libraries and 454-pyrotags). Estimates of MGA abundance as a proportion of total bacteria were similar across all three methods although estimates based on CARD-FISH were consistently lower in the OMZ (5.6%±1.9%) than estimates based on 16S rRNA gene clone libraries (11.0%±3.9%) or pyrotags (9.9%±1.8%). Five previously defined MGA subgroups were recovered in 16S rRNA gene clone libraries and five novel subgroups were defined (HF770D10, P262000D03, P41300E03, P262000N21 and A714018). Rarefaction analysis of pyrotag data indicated that the ultimate richness of MGA was very nearly sampled. Spearman's rank analysis of MGA abundances by CARD-FISH and O(2) concentrations resulted in significant correlation. Analyzed in more detail by 16S rRNA pyrotag sequencing, MGA operational taxonomic units affiliated with subgroups Arctic95A-2 and A714018 comprised 0.3-2.4% of total bacterial sequences and displayed strong correlations with decreasing O(2) concentration. This study is the first comprehensive description of MGA diversity using complementary techniques. These results provide a phylogenetic framework for interpreting future studies on ecotype selection among MGA subgroups, and suggest a potentially important role for MGA in the ecology and biogeochemistry of OMZs.
Assuntos
Bactérias/classificação , Biodiversidade , Filogenia , Água do Mar/microbiologia , Bactérias/genética , Sequência de Bases , DNA Bacteriano/genética , Biblioteca Gênica , Dados de Sequência Molecular , Oceano Pacífico , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Microbiologia da ÁguaRESUMO
Dissolved oxygen concentration is a crucial organizing principle in marine ecosystems. As oxygen levels decline, energy is increasingly diverted away from higher trophic levels into microbial metabolism, leading to loss of fixed nitrogen and to production of greenhouse gases, including nitrous oxide and methane. In this Review, we describe current efforts to explore the fundamental factors that control the ecological and microbial biodiversity in oxygen-starved regions of the ocean, termed oxygen minimum zones. We also discuss how recent advances in microbial ecology have provided information about the potential interactions in distributed co-occurrence and metabolic networks in oxygen minimum zones, and we provide new insights into coupled biogeochemical processes in the ocean.
Assuntos
Biota , Metabolismo Energético , Oxigênio/metabolismo , Água do Mar/microbiologia , Efeito Estufa , Metano/metabolismo , Óxido Nitroso/metabolismoRESUMO
String barcoding is a recently introduced technique for genomic based identification of microorganisms. In this paper, we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size, on a well equipped workstation. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.