RESUMO
The understanding of marine microbial ecology and metabolism has been hampered by the paucity of sequenced reference genomes. To this end, we report the sequencing of 137 diverse marine isolates collected from around the world. We analysed these sequences, along with previously published marine prokaryotic genomes, in the context of marine metagenomic data, to gain insights into the ecology of the surface ocean prokaryotic picoplankton (0.1-3.0 µm size range). The results suggest that the sequenced genomes define two microbial groups: one composed of only a few taxa that are nearly always abundant in picoplanktonic communities, and the other consisting of many microbial taxa that are rarely abundant. The genomic content of the second group suggests that these microbes are capable of slow growth and survival in energy-limited environments, and rapid growth in energy-rich environments. By contrast, the abundant and cosmopolitan picoplanktonic prokaryotes for which there is genomic representation have smaller genomes, are probably capable of only slow growth and seem to be relatively unable to sense or rapidly acclimate to energy-rich conditions. Their genomic features also lead us to propose that one method used to avoid predation by viruses and/or bacterivores is by means of slow growth and the maintenance of low biomass.
Assuntos
Organismos Aquáticos/genética , Genômica , Metagenoma , Plâncton/genética , Células Procarióticas/metabolismo , Organismos Aquáticos/classificação , Organismos Aquáticos/isolamento & purificação , Organismos Aquáticos/virologia , Biodiversidade , Biomassa , Bases de Dados de Proteínas , Genoma Bacteriano/genética , Modelos Biológicos , Oceanos e Mares , Filogenia , Plâncton/crescimento & desenvolvimento , Plâncton/isolamento & purificação , Plâncton/metabolismo , Células Procarióticas/classificação , Células Procarióticas/virologia , RNA Ribossômico 16S/genética , Microbiologia da ÁguaRESUMO
Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Análise de Sequência de DNA , Sequência de Bases , Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Cromossomos Humanos , Cromossomos Humanos Y/genética , Dosagem de Genes , Genótipo , Haplótipos , Projeto Genoma Humano , Humanos , Mutação INDEL , Hibridização in Situ Fluorescente , Masculino , Análise em Microsséries , Pessoa de Meia-Idade , Dados de Sequência Molecular , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodosRESUMO
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
Assuntos
Microbiologia da Água , Biologia Computacional , Cadeia Alimentar , Oceanos e Mares , Plâncton , Especificidade da EspécieRESUMO
Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.
Assuntos
Amplificação de Genes , Genoma , Nanotecnologia , Hibridização in Situ Fluorescente , Microfluídica , Sondas RNARESUMO
BACKGROUND: Polymerase chain reaction (PCR) is used in directed sequencing for the discovery of novel polymorphisms. As the first step in PCR directed sequencing, effective PCR primer design is crucial for obtaining high-quality sequence data for target regions. Since current computational primer design tools are not fully tuned with stable underlying laboratory protocols, researchers may still be forced to iteratively optimize protocols for failed amplifications after the primers have been ordered. Furthermore, potentially identifiable factors which contribute to PCR failures have yet to be elucidated. This inefficient approach to primer design is further intensified in a high-throughput laboratory, where hundreds of genes may be targeted in one experiment. RESULTS: We have developed a fully integrated computational PCR primer design pipeline that plays a key role in our high-throughput directed sequencing pipeline. Investigators may specify target regions defined through a rich set of descriptors, such as Ensembl accessions and arbitrary genomic coordinates. Primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the specified target regions. As part of the tiling process, primer pairs are computationally screened to meet the criteria for success with one of two PCR amplification protocols. In the process of improving our sequencing success rate, which currently exceeds 95% for exons, we have discovered novel and accurate computational methods capable of identifying primers that may lead to PCR failures. We reveal the laboratory protocols and their associated, empirically determined computational parameters, as well as describe the novel computational methods which may benefit others in future primer design research. CONCLUSION: The high-throughput PCR primer design pipeline has been very successful in providing the basis for high-quality directed sequencing results and for minimizing costs associated with labor and reprocessing. The modular architecture of the primer design software has made it possible to readily integrate additional primer critique tests based on iterative feedback from the laboratory. As a result, the primer design software, coupled with the laboratory protocols, serves as a powerful tool for low and high-throughput primer design to enable successful directed sequencing.
Assuntos
Algoritmos , Primers do DNA/genética , Reação em Cadeia da Polimerase/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.
Assuntos
Bactérias/genética , Biodiversidade , RNA Ribossômico 16S/genética , Estatística como Assunto/métodos , Microbiologia da Água , Bactérias/classificação , DNA Bacteriano/genética , Monitoramento Ambiental , Biblioteca Gênica , Genes de RNAr , Tamanho da Amostra , Alinhamento de Sequência , Análise de Sequência de DNARESUMO
The oldest extant human maternal lineages include mitochondrial haplogroups L0d and L0k found in the southern African click-speaking forager peoples broadly classified as Khoesan. Profiling these early mitochondrial lineages allows for better understanding of modern human evolution. In this study, we profile 77 new early-diverged complete mitochondrial genomes and sub-classify another 105 L0d/L0k individuals from southern Africa. We use this data to refine basal phylogenetic divergence, coalescence times and Khoesan prehistory. Our results confirm L0d as the earliest diverged lineage (â¼172 kya, 95%CI: 149-199 kya), followed by L0k (â¼159 kya, 95%CI: 136-183 kya) and a new lineage we name L0g (â¼94 kya, 95%CI: 72-116 kya). We identify two new L0d1 subclades we name L0d1d and L0d1c4/L0d1e, and estimate L0d2 and L0d1 divergence at â¼93 kya (95%CI:76-112 kya). We concur the earliest emerging L0d1'2 sublineage L0d1b (â¼49 kya, 95%CI:37-58 kya) is widely distributed across southern Africa. Concomitantly, we find the most recent sublineage L0d2a (â¼17 kya, 95%CI:10-27 kya) to be equally common. While we agree that lineages L0d1c and L0k1a are restricted to contemporary inland Khoesan populations, our observed predominance of L0d2a and L0d1a in non-Khoesan populations suggests a once independent coastal Khoesan prehistory. The distribution of early-diverged human maternal lineages within contemporary southern Africans suggests a rich history of human existence prior to any archaeological evidence of migration into the region. For the first time, we provide a genetic-based evidence for significant modern human evolution in southern Africa at the time of the Last Glacial Maximum at between â¼21-17 kya, coinciding with the emergence of major lineages L0d1a, L0d2b, L0d2d and L0d2a.
Assuntos
Evolução Biológica , Genoma Mitocondrial , Filogenia , África Austral , População Negra/genética , DNA Mitocondrial/genética , Feminino , Genética Populacional , Haplótipos , Humanos , Dados de Sequência MolecularRESUMO
BACKGROUND: Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals. RESULTS: Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur. CONCLUSIONS: Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies.
Assuntos
Genética Populacional , Análise de Sequência de DNA/instrumentação , Sequência de Bases , Simulação por Computador , Reações Falso-Positivas , Genótipo , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de SequênciaRESUMO
It is well established that epigenetic modulation of genome accessibility in chromatin occurs during biological processes. Here we describe a method based on restriction enzymes and next-generation sequencing for identifying accessible DNA elements using a small amount of starting material, and use it to examine myeloid differentiation of primary human CD34+ cells. The accessibility of several classes of cis-regulatory elements was a predictive marker of in vivo DNA binding by transcription factors, and was associated with distinct patterns of histone posttranslational modifications. We also mapped large chromosomal domains with differential accessibility in progenitors and maturing cells. Accessibility became restricted during differentiation, correlating with a decreased number of expressed genes and loss of regulatory potential. Our data suggest that a permissive chromatin structure in multipotent cells is progressively and selectively closed during differentiation, and illustrate the use of our method for the identification of functional cis-regulatory elements.
Assuntos
Diferenciação Celular/genética , Cromatina/genética , Epigênese Genética , Estudo de Associação Genômica Ampla/métodos , Antígenos CD34/metabolismo , Células Cultivadas , Enzimas de Restrição do DNA , Regulação da Expressão Gênica no Desenvolvimento , Células-Tronco Hematopoéticas/citologia , Células-Tronco Hematopoéticas/metabolismo , Histonas/metabolismo , Humanos , Mielopoese/genética , Fatores de Transcrição/metabolismoRESUMO
It is now clear that tyrosine kinases represent attractive targets for therapeutic intervention in cancer. Recent advances in DNA sequencing technology now provide the opportunity to survey mutational changes in cancer in a high-throughput and comprehensive manner. Here we report on the sequence analysis of members of the receptor tyrosine kinase (RTK) gene family in the genomes of glioblastoma brain tumors. Previous studies have identified a number of molecular alterations in glioblastoma, including amplification of the RTK epidermal growth factor receptor. We have identified mutations in two other RTKs: (i) fibroblast growth receptor 1, including the first mutations in the kinase domain in this gene observed in any cancer, and (ii) a frameshift mutation in the platelet-derived growth factor receptor-alpha gene. Fibroblast growth receptor 1, platelet-derived growth factor receptor-alpha, and epidermal growth factor receptor are all potential entry points to the phosphatidylinositol 3-kinase and mitogen-activated protein kinase intracellular signaling pathways already known to be important for neoplasia. Our results demonstrate the utility of applying DNA sequencing technology to systematically assess the coding sequence of genes within cancer genomes.
Assuntos
Neoplasias Encefálicas/genética , Evolução Molecular , Glioblastoma/genética , Modelos Moleculares , Mutação/genética , Receptores Proteína Tirosina Quinases/genética , Adulto , Sequência de Aminoácidos , Sequência de Bases , Criança , Feminino , Genômica/métodos , Humanos , Masculino , Modelos Genéticos , Dados de Sequência Molecular , Receptor Tipo 1 de Fator de Crescimento de Fibroblastos/genética , Receptor alfa de Fator de Crescimento Derivado de Plaquetas/genética , Análise de Sequência de DNARESUMO
The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus. Ribosomal protein and rRNA-based phylogenies place its branching point early in the archaeal lineage, representing the new archaeal kingdom Nanoarchaeota. The N. equitans genome (490,885 base pairs) encodes the machinery for information processing and repair, but lacks genes for lipid, cofactor, amino acid, or nucleotide biosyntheses. It is the smallest microbial genome sequenced to date, and also one of the most compact, with 95% of the DNA predicted to encode proteins or stable RNAs. Its limited biosynthetic and catabolic capacity indicates that N. equitans' symbiotic relationship to Ignicoccus is parasitic, making it the only known archaeal parasite. Unlike the small genomes of bacterial parasites that are undergoing reductive evolution, N. equitans has few pseudogenes or extensive regions of noncoding DNA. This organism represents a basal archaeal lineage and has a highly reduced genome.