Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
2.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
4.
Am J Hum Genet ; 108(4): 656-668, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33770507

ABSTRACT

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.


Subject(s)
DNA Mutational Analysis/economics , DNA Mutational Analysis/standards , Genetic Variation/genetics , Genetics, Population/economics , Africa , DNA Mutational Analysis/methods , Genetics, Population/methods , Genome, Human/genetics , Genome-Wide Association Study , Health Equity , Humans , Microbiota , Whole Genome Sequencing/economics , Whole Genome Sequencing/standards
6.
Nature ; 581(7809): 434-443, 2020 05.
Article in English | MEDLINE | ID: mdl-32461654

ABSTRACT

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.


Subject(s)
Exome/genetics , Genes, Essential/genetics , Genetic Variation/genetics , Genome, Human/genetics , Adult , Brain/metabolism , Cardiovascular Diseases/genetics , Cohort Studies , Databases, Genetic , Female , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Humans , Loss of Function Mutation/genetics , Male , Mutation Rate , Proprotein Convertase 9/genetics , RNA, Messenger/genetics , Reproducibility of Results , Exome Sequencing , Whole Genome Sequencing
7.
BMC Genomics ; 19(1): 332, 2018 May 08.
Article in English | MEDLINE | ID: mdl-29739332

ABSTRACT

BACKGROUND: Here we present an in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned flow cells with Exclusion Amplification (ExAmp) chemistry (HiSeqX, HiSeq4000, and NovaSeq). We also present a remediation method that minimizes the impact of such swaps. RESULTS: Leveraging data collected over a two-year period, we demonstrate the widespread prevalence of index swapping in patterned flow cell data. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different library methods can have vastly different swapping rates and that even non-ExAmp chemistry instruments display trace levels of index swapping. We provide methods for eliminating sample data cross contamination by utilizing non-redundant dual indexing for complete filtering of index swapped reads, and share the sequences for 96 non-combinatorial dual indexes we have validated across various library preparation methods and sequencer models. Finally, using computational methods we provide a greater insight into the mechanism of index swapping. CONCLUSIONS: Index swapping in pooled libraries is a prevalent phenomenon that we observe at a rate of 0.2 to 6% in all sequencing runs on HiSeqX, HiSeq 4000/3000, and NovaSeq. Utilizing non-redundant dual indexing allows for the removal (flagging/filtering) of these swapped reads and eliminates swapping induced sample contamination, which is critical for sensitive applications such as RNA-seq, single cell, blood biopsy using circulating tumor DNA, or clinical sequencing.


Subject(s)
High-Throughput Nucleotide Sequencing , Sequence Analysis/methods , DNA/chemistry , DNA/isolation & purification , DNA/metabolism , Gene Library , Genome, Human , Humans , Sequence Analysis, DNA
8.
PLoS One ; 9(5): e96953, 2014.
Article in English | MEDLINE | ID: mdl-24824441

ABSTRACT

Photobacterium profundum is a cosmopolitan marine bacterium capable of growth at low temperature and high hydrostatic pressure. Multiple strains of P. profundum have been isolated from different depths of the ocean and display remarkable differences in their physiological responses to pressure. The genome sequence of the deep-sea piezopsychrophilic strain Photobacterium profundum SS9 has provided some clues regarding the genetic features required for growth in the deep sea. The sequenced genome of Photobacterium profundum strain 3TCK, a non-piezophilic strain isolated from a shallow-water environment, is now available and its analysis expands the identification of unique genomic features that correlate to environmental differences and define the Hutchinsonian niche of each strain. These differences range from variations in gene content to specific gene sequences under positive selection. Genome plasticity between Photobacterium bathytypes was investigated when strain 3TCK-specific genes involved in photorepair were introduced to SS9, demonstrating that horizontal gene transfer can provide a mechanism for rapid colonisation of new environments.


Subject(s)
Ecotype , Gene Expression Regulation, Bacterial , Genetic Variation , Genome, Bacterial , Photobacterium/genetics
9.
J Bacteriol ; 194(8): 2119-20, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22461554

ABSTRACT

Here, we present the draft genome sequence of "Candidatus Nitrosoarchaeum limnia" BG20, an ammonia-oxidizing archaeon enriched in culture from low-salinity sediments of the San Francisco Bay estuary. The genome sequence revealed many similarities to the previously sequenced genome of "Ca. Nitrosoarchaeum limnia" SFB1 (enriched from a nearby site in San Francisco Bay) and is representative of a clade of ammonia-oxidizing archaea (AOA) found in low-salinity habitats worldwide.


Subject(s)
Ammonia/metabolism , Archaea/classification , Archaea/genetics , Genome, Archaeal , Base Sequence , Gene Expression Regulation, Archaeal , Geologic Sediments/microbiology , Molecular Sequence Data , Nitrogen/metabolism , Oceans and Seas , Oxidation-Reduction
10.
J Bacteriol ; 194(8): 2121-2, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22461555

ABSTRACT

Ammonia-oxidizing archaea (AOA) play important roles in nitrogen and carbon cycling in marine and terrestrial ecosystems. Here, we present the draft genome sequence for the ammonia-oxidizing archaeon "Candidatus Nitrosopumilus salaria" BD31, which was enriched in culture from sediments of the San Francisco Bay estuary. The genome sequences revealed many similarities to the genome of Nitrosopumilus maritimus.


Subject(s)
Ammonia/metabolism , Archaea/classification , Archaea/genetics , Genome, Archaeal , Base Sequence , Gene Expression Regulation, Archaeal , Geologic Sediments/microbiology , Molecular Sequence Data , Nitrogen/metabolism , Oceans and Seas , Oxidation-Reduction
11.
Stand Genomic Sci ; 5(1): 135-43, 2011 Oct 15.
Article in English | MEDLINE | ID: mdl-22180817

ABSTRACT

Caminibacter mediatlanticus strain TB-2(T) [1], is a thermophilic, anaerobic, chemolithoautotrophic bacterium, isolated from the walls of an active deep-sea hydrothermal vent chimney on the Mid-Atlantic Ridge and the type strain of the species. C. mediatlanticus is a Gram-negative member of the Epsilonproteobacteria (order Nautiliales) that grows chemolithoautotrophically with H(2) as the energy source and CO(2) as the carbon source. Nitrate or sulfur is used as the terminal electron acceptor, with resulting production of ammonium and hydrogen sulfide, respectively. In view of the widespread distribution, importance and physiological characteristics of thermophilic Epsilonproteobacteria in deep-sea geothermal environments, it is likely that these organisms provide a relevant contribution to both primary productivity and the biogeochemical cycling of carbon, nitrogen and sulfur at hydrothermal vents. Here we report the main features of the genome of C. mediatlanticus strain TB-2(T).

12.
J Bacteriol ; 193(20): 5881-2, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21952547

ABSTRACT

Here we report the full genome sequence of marine phototrophic bacterium Erythrobacter sp. strain NAP1. The 3.3-Mb genome contains a full set of photosynthetic genes organized in one 38.9-kb cluster; however, it does not contain genes for CO(2) or N(2) fixation, thereby confirming that the organism is a photoheterotroph.


Subject(s)
Genome, Bacterial , Seawater/microbiology , Sphingomonadaceae/genetics , Bacterial Proteins/genetics , Base Sequence , Molecular Sequence Data , Sphingomonadaceae/classification , Sphingomonadaceae/isolation & purification
13.
Genome Biol Evol ; 3: 601-13, 2011.
Article in English | MEDLINE | ID: mdl-21697100

ABSTRACT

Gene duplication may be an important mechanism for the evolution of new functions and for the adaptive modulation of gene expression via dosage effects. Here, we analyzed the fate of gene duplicates for two strains of a novel group of cyanobacteria (genus Acaryochloris) that produces the far-red light absorbing chlorophyll d as its main photosynthetic pigment. The genomes of both strains contain an unusually high number of gene duplicates for bacteria. As has been observed for eukaryotic genomes, we find that the demography of gene duplicates can be well modeled by a birth-death process. Most duplicated Acaryochloris genes are of comparatively recent origin, are strain-specific, and tend to be located on different genetic elements. Analyses of selection on duplicates of different divergence classes suggest that a minority of paralogs exhibit near neutral evolutionary dynamics immediately following duplication but that most duplicate pairs (including those which have been retained for long periods) are under strong purifying selection against amino acid change. The likelihood of duplicate retention varied among gene functional classes, and the pronounced differences between strains in the pool of retained recent duplicates likely reflects differences in the nutrient status and other characteristics of their respective environments. We conclude that most duplicates are quickly purged from Acaryochloris genomes and that those which are retained likely make important contributions to organism ecology by conferring fitness benefits via gene dosage effects. The mechanism of enhanced duplication may involve homologous recombination between genetic elements mediated by paralogous copies of recA.


Subject(s)
Chlorophyll/biosynthesis , Cyanobacteria/genetics , Gene Duplication , Genome, Bacterial/genetics , Amino Acid Sequence , Chromosomes, Bacterial/genetics , Contig Mapping , Cyanobacteria/classification , Cyanobacteria/metabolism , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Ecosystem , Evolution, Molecular , Genes, Bacterial/genetics , Genes, Duplicate/genetics , Genetic Variation , Molecular Sequence Data , Open Reading Frames/genetics , Phylogeny , Selection, Genetic , Sequence Analysis, DNA , Sequence Homology, Amino Acid , Species Specificity
14.
J Bacteriol ; 193(6): 1485-6, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21183675

ABSTRACT

Bacteria are the primary food source of choanoflagellates, the closest known relatives of animals. Studying signaling interactions between the Gram-negative Bacteroidetes bacterium Algoriphagus sp. PR1 and its predator, the choanoflagellate Salpingoeca rosetta, provides a promising avenue for testing hypotheses regarding the involvement of bacteria in animal evolution. Here we announce the complete genome sequence of Algoriphagus sp. PR1 and initial findings from its annotation.


Subject(s)
Bacteroidetes/genetics , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Genome, Bacterial , Choanoflagellata/physiology , Molecular Sequence Data , Sequence Analysis, DNA
15.
Proc Natl Acad Sci U S A ; 106(37): 15527-33, 2009 Sep 15.
Article in English | MEDLINE | ID: mdl-19805210

ABSTRACT

Many marine bacteria have evolved to grow optimally at either high (copiotrophic) or low (oligotrophic) nutrient concentrations, enabling different species to colonize distinct trophic habitats in the oceans. Here, we compare the genome sequences of two bacteria, Photobacterium angustum S14 and Sphingopyxis alaskensis RB2256, that serve as useful model organisms for copiotrophic and oligotrophic modes of life and specifically relate the genomic features to trophic strategy for these organisms and define their molecular mechanisms of adaptation. We developed a model for predicting trophic lifestyle from genome sequence data and tested >400,000 proteins representing >500 million nucleotides of sequence data from 126 genome sequences with metagenome data of whole environmental samples. When applied to available oceanic metagenome data (e.g., the Global Ocean Survey data) the model demonstrated that oligotrophs, and not the more readily isolatable copiotrophs, dominate the ocean's free-living microbial populations. Using our model, it is now possible to define the types of bacteria that specific ocean niches are capable of sustaining.


Subject(s)
Bacteria/growth & development , Bacteria/genetics , Genome, Bacterial , Ecosystem , Marine Biology , Models, Biological , Molecular Sequence Data , Photobacterium/genetics , Photobacterium/growth & development , Sphingomonadaceae/genetics , Sphingomonadaceae/growth & development
16.
PLoS Genet ; 3(12): e231, 2007 Dec.
Article in English | MEDLINE | ID: mdl-18159947

ABSTRACT

Prochlorococcus is a marine cyanobacterium that numerically dominates the mid-latitude oceans and is the smallest known oxygenic phototroph. Numerous isolates from diverse areas of the world's oceans have been studied and shown to be physiologically and genetically distinct. All isolates described thus far can be assigned to either a tightly clustered high-light (HL)-adapted clade, or a more divergent low-light (LL)-adapted group. The 16S rRNA sequences of the entire Prochlorococcus group differ by at most 3%, and the four initially published genomes revealed patterns of genetic differentiation that help explain physiological differences among the isolates. Here we describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution. There are 1,273 genes that represent the core shared by all 12 genomes. They are apparently sufficient, according to metabolic reconstruction, to encode a functional cell. We describe a phylogeny for all 12 isolates by subjecting their complete proteomes to three different phylogenetic analyses. For each non-core gene, we used a maximum parsimony method to estimate which ancestor likely first acquired or lost each gene. Many of the genetic differences among isolates, especially for genes involved in outer membrane synthesis and nutrient transport, are found within the same clade. Nevertheless, we identified some genes defining HL and LL ecotypes, and clades within these broad ecotypes, helping to demonstrate the basis of HL and LL adaptations in Prochlorococcus. Furthermore, our estimates of gene gain events allow us to identify highly variable genomic islands that are not apparent through simple pairwise comparisons. These results emphasize the functional roles, especially those connected to outer membrane synthesis and transport that dominate the flexible genome and set it apart from the core. Besides identifying islands and demonstrating their role throughout the history of Prochlorococcus, reconstruction of past gene gains and losses shows that much of the variability exists at the "leaves of the tree," between the most closely related strains. Finally, the identification of core and flexible genes from this 12-genome comparison is largely consistent with the relative frequency of Prochlorococcus genes found in global ocean metagenomic databases, further closing the gap between our understanding of these organisms in the lab and the wild.


Subject(s)
Biological Evolution , Genome, Bacterial , Prochlorococcus/genetics , Chromosomes, Bacterial/genetics , Ecosystem , Genes, Bacterial , Phylogeny , Prochlorococcus/classification , Prochlorococcus/isolation & purification , Prochlorococcus/metabolism , RNA, Bacterial/genetics , RNA, Ribosomal, 16S/genetics , Species Specificity , Synechococcus/classification , Synechococcus/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...