Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Nature ; 468(7320): 60-6, 2010 Nov 04.
Article in English | MEDLINE | ID: mdl-21048761

ABSTRACT

The understanding of marine microbial ecology and metabolism has been hampered by the paucity of sequenced reference genomes. To this end, we report the sequencing of 137 diverse marine isolates collected from around the world. We analysed these sequences, along with previously published marine prokaryotic genomes, in the context of marine metagenomic data, to gain insights into the ecology of the surface ocean prokaryotic picoplankton (0.1-3.0 µm size range). The results suggest that the sequenced genomes define two microbial groups: one composed of only a few taxa that are nearly always abundant in picoplanktonic communities, and the other consisting of many microbial taxa that are rarely abundant. The genomic content of the second group suggests that these microbes are capable of slow growth and survival in energy-limited environments, and rapid growth in energy-rich environments. By contrast, the abundant and cosmopolitan picoplanktonic prokaryotes for which there is genomic representation have smaller genomes, are probably capable of only slow growth and seem to be relatively unable to sense or rapidly acclimate to energy-rich conditions. Their genomic features also lead us to propose that one method used to avoid predation by viruses and/or bacterivores is by means of slow growth and the maintenance of low biomass.


Subject(s)
Aquatic Organisms/genetics , Genomics , Metagenome , Plankton/genetics , Prokaryotic Cells/metabolism , Aquatic Organisms/classification , Aquatic Organisms/isolation & purification , Aquatic Organisms/virology , Biodiversity , Biomass , Databases, Protein , Genome, Bacterial/genetics , Models, Biological , Oceans and Seas , Phylogeny , Plankton/growth & development , Plankton/isolation & purification , Plankton/metabolism , Prokaryotic Cells/classification , Prokaryotic Cells/virology , RNA, Ribosomal, 16S/genetics , Water Microbiology
2.
Nucleic Acids Res ; 37(Database issue): D1018-24, 2009 Jan.
Article in English | MEDLINE | ID: mdl-19036787

ABSTRACT

The HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional annotations on either genome in the comparative view. Demonstrated here are some potential uses of the browser to enable a better understanding of individual human genetic variation. The browser provides full access to the underlying reads with sequence and quality information, the genome assembly and the evidence supporting the identification of DNA polymorphisms. The HuRef Browser is a unique and versatile tool for browsing genome assemblies and studying individual human sequence variation in a diploid context. The browser is available online at http://huref.jcvi.org.


Subject(s)
Databases, Nucleic Acid , Genetic Variation , Genome, Human , Genomics , Humans , Internet , Software
3.
PLoS Biol ; 5(3): e16, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17355171

ABSTRACT

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.


Subject(s)
Proteins/chemistry , Expressed Sequence Tags , Oceans and Seas , Proteins/genetics , Water Microbiology
4.
PLoS Biol ; 5(10): e254, 2007 Sep 04.
Article in English | MEDLINE | ID: mdl-17803354

ABSTRACT

Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.


Subject(s)
Chromosome Mapping , Diploidy , Genome, Human , Sequence Analysis, DNA , Base Sequence , Chromosome Mapping/instrumentation , Chromosome Mapping/methods , Chromosomes, Human , Chromosomes, Human, Y/genetics , Gene Dosage , Genotype , Haplotypes , Human Genome Project , Humans , INDEL Mutation , In Situ Hybridization, Fluorescence , Male , Microarray Analysis , Middle Aged , Molecular Sequence Data , Pedigree , Phenotype , Polymorphism, Single Nucleotide , Reproducibility of Results , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/methods
5.
PLoS Biol ; 5(3): e77, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17355176

ABSTRACT

The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.


Subject(s)
Water Microbiology , Computational Biology , Food Chain , Oceans and Seas , Plankton , Species Specificity
6.
Appl Environ Microbiol ; 75(18): 5821-30, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19633107

ABSTRACT

Using a metagenomics approach, we have cloned a piece of environmental DNA from the Sargasso Sea that encodes an [NiFe] hydrogenase showing 60% identity to the large subunit and 64% to the small subunit of a Thiocapsa roseopersicina O2-tolerant [NiFe] hydrogenase. The DNA sequence of the hydrogenase identified by the metagenomic approach was subsequently found to be 99% identical to the hyaA and hyaB genes of an Alteromonas macleodii hydrogenase, indicating that it belongs to the Alteromonas clade. We were able to express our new Alteromonas hydrogenase in T. roseopersicina. Expression was accomplished by coexpressing only two accessory genes, hyaD and hupH, without the need to express any of the hyp accessory genes (hypABCDEF). These results suggest that the native accessory proteins in T. roseopersicina could substitute for the Alteromonas counterparts that are absent in the host to facilitate the assembly of a functional Alteromonas hydrogenase. To further compare the complex assembly machineries of these two [NiFe] hydrogenases, we performed complementation experiments by introducing the new Alteromonas hyaD gene into the T. roseopersicina hynD mutant. Interestingly, Alteromonas endopeptidase HyaD could complement T. roseopersicina HynD to cleave endoproteolytically the C-terminal end of the T. roseopersicina HynL hydrogenase large subunit and activate the enzyme. This study refines our knowledge on the selectivity and pleiotropy of the elements of the [NiFe] hydrogenase assembly machineries. It also provides a model for functionally analyzing novel enzymes from environmental microbes in a culture-independent manner.


Subject(s)
DNA, Bacterial/genetics , DNA, Bacterial/isolation & purification , Hydrogenase/genetics , Seawater/microbiology , Thiocapsa roseopersicina/genetics , Alteromonas/genetics , Cloning, Molecular , Gene Deletion , Gene Expression , Genetic Complementation Test , Sequence Analysis, DNA , Sequence Homology, Amino Acid
8.
PLoS One ; 6(3): e18011, 2011 Mar 18.
Article in English | MEDLINE | ID: mdl-21437252

ABSTRACT

BACKGROUND: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. METHODOLOGY/PRINCIPAL FINDINGS: We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. CONCLUSIONS/SIGNIFICANCE: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.


Subject(s)
Bacterial Proteins/genetics , Databases, Genetic , Evolution, Molecular , Metagenomics , Phylogeny , RNA, Ribosomal/genetics , Rec A Recombinases/genetics , Base Sequence , Multigene Family/genetics , Oceans and Seas
9.
PLoS One ; 5(3): e9773, 2010 Mar 19.
Article in English | MEDLINE | ID: mdl-20333304

ABSTRACT

The Yellowstone caldera contains the most numerous and diverse geothermal systems on Earth, yielding an extensive array of unique high-temperature environments that host a variety of deeply-rooted and understudied Archaea, Bacteria and Eukarya. The combination of extreme temperature and chemical conditions encountered in geothermal environments often results in considerably less microbial diversity than other terrestrial habitats and offers a tremendous opportunity for studying the structure and function of indigenous microbial communities and for establishing linkages between putative metabolisms and element cycling. Metagenome sequence (14-15,000 Sanger reads per site) was obtained for five high-temperature (>65 degrees C) chemotrophic microbial communities sampled from geothermal springs (or pools) in Yellowstone National Park (YNP) that exhibit a wide range in geochemistry including pH, dissolved sulfide, dissolved oxygen and ferrous iron. Metagenome data revealed significant differences in the predominant phyla associated with each of these geochemical environments. Novel members of the Sulfolobales are dominant in low pH environments, while other Crenarchaeota including distantly-related Thermoproteales and Desulfurococcales populations dominate in suboxic sulfidic sediments. Several novel archaeal groups are well represented in an acidic (pH 3) Fe-oxyhydroxide mat, where a higher O2 influx is accompanied with an increase in archaeal diversity. The presence or absence of genes and pathways important in S oxidation-reduction, H2-oxidation, and aerobic respiration (terminal oxidation) provide insight regarding the metabolic strategies of indigenous organisms present in geothermal systems. Multiple-pathway and protein-specific functional analysis of metagenome sequence data corroborated results from phylogenetic analyses and clearly demonstrate major differences in metabolic potential across sites. The distribution of functional genes involved in electron transport is consistent with the hypothesis that geochemical parameters (e.g., pH, sulfide, Fe, O2) control microbial community structure and function in YNP geothermal springs.


Subject(s)
Hot Springs/microbiology , Hot Temperature , Metagenome , Archaea/genetics , Bacteria/genetics , Geology/methods , Heme/chemistry , Hydrogen-Ion Concentration , Iron/chemistry , Oxidoreductases/genetics , Oxygen/chemistry , Phylogeny , RNA, Ribosomal, 16S/genetics , Sulfides/chemistry , Temperature , Water Microbiology
10.
PLoS One ; 3(1): e1456, 2008 Jan 23.
Article in English | MEDLINE | ID: mdl-18213365

ABSTRACT

Viruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within microbial fractions, the prevalence of genes among viral sequences that encode microbial physiological function and their distinct phylogenetic distribution lend strong support to the notion that viral-mediated gene acquisition is a common and ongoing mechanism for generating microbial diversity in the marine environment.


Subject(s)
Genome, Viral , Water Microbiology , Genetic Linkage , Oceans and Seas , Phylogeny
11.
Proc Natl Acad Sci U S A ; 103(30): 11240-5, 2006 Jul 25.
Article in English | MEDLINE | ID: mdl-16840556

ABSTRACT

Since its introduction a decade ago, whole-genome shotgun sequencing (WGS) has been the main approach for producing cost-effective and high-quality genome sequence data. Until now, the Sanger sequencing technology that has served as a platform for WGS has not been truly challenged by emerging technologies. The recent introduction of the pyrosequencing-based 454 sequencing platform (454 Life Sciences, Branford, CT) offers a very promising sequencing technology alternative for incorporation in WGS. In this study, we evaluated the utility and cost-effectiveness of a hybrid sequencing approach using 3730xl Sanger data and 454 data to generate higher-quality lower-cost assemblies of microbial genomes compared to current Sanger sequencing strategies alone.


Subject(s)
Biotechnology/methods , Genes, Bacterial , Genome, Bacterial , Sequence Analysis, DNA/methods , Biotechnology/trends , Computational Biology/methods , Contig Mapping
12.
Science ; 300(5617): 290-3, 2003 Apr 11.
Article in English | MEDLINE | ID: mdl-12690188

ABSTRACT

The systems biology revolution is proceeding along multiple pathways as different science agencies and the private sector have adopted strategies suited to their particular needs and cultures. To meet this challenge, the U.S. Department of Energy has developed the Genomes to Life (GTL) program. A central focus of GTL is environmental microbial biology as a way to approach global environmental problems, and its key goal is to achieve, over the next 10 to 20 years, a basic understanding of thousands of microbes and microbial systems in their native environments. This focus demands that we address huge gaps in knowledge, technology, computing, data storage and manipulation, and systems-level integration.


Subject(s)
Computational Biology , Environmental Microbiology , Genetics, Microbial , Genomics , Biotechnology , Climate , Energy-Generating Resources , Environment , Environmental Pollution , Federal Government , Genome, Bacterial , Genome, Fungal , Government Agencies , Models, Biological , Proteome/analysis , Proteomics , United States
13.
Article in English | MEDLINE | ID: mdl-16826641

ABSTRACT

Genome to life (GTL), the U.S Department of Energy Office of Science's systems biology program, focuses on environmental microbiology. Over the next 10 to 20 years, GTL's key goal is to understand the life processes of thousands of microbes and microbial systems in their native environments. This focus demands that we address huge gaps in knowledge, technology, computing, data capture and analysis, and systems-level integration. Distinguishing features include (1) strategies for unprecedented, comprehensive, and high-throughput data collection; (2) advanced computing, mathematics, algorithms, and data-management technologies; (3) a focus on potential microbial capabilities to help solve energy and environmental challenges; and (4) new research and management models that link production-scale systems biology facilities in an accessible environment. This unprecedented opportunity to provide the scientific foundation for solving urgent problems in energy, global climate change, and environmental cleanup demands that we take bold steps to achieve a much faster, more efficient pace of biological discovery.


Subject(s)
Biological Science Disciplines/trends , Chromosome Mapping/trends , Computational Biology/trends , Genome, Bacterial/genetics , Genomics/trends , Government Programs/organization & administration , Research/trends , Government Agencies/organization & administration , United States
SELECTION OF CITATIONS
SEARCH DETAIL