Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 86
Filter
Add more filters

Publication year range
1.
Cell ; 187(2): 464-480.e10, 2024 01 18.
Article in English | MEDLINE | ID: mdl-38242088

ABSTRACT

Primary open-angle glaucoma (POAG), the leading cause of irreversible blindness worldwide, disproportionately affects individuals of African ancestry. We conducted a genome-wide association study (GWAS) for POAG in 11,275 individuals of African ancestry (6,003 cases; 5,272 controls). We detected 46 risk loci associated with POAG at genome-wide significance. Replication and post-GWAS analyses, including functionally informed fine-mapping, multiple trait co-localization, and in silico validation, implicated two previously undescribed variants (rs1666698 mapping to DBF4P2; rs34957764 mapping to ROCK1P1) and one previously associated variant (rs11824032 mapping to ARHGEF12) as likely causal. For individuals of African ancestry, a polygenic risk score (PRS) for POAG from our mega-analysis (African ancestry individuals) outperformed a PRS from summary statistics of a much larger GWAS derived from European ancestry individuals. This study quantifies the genetic architecture similarities and differences between African and non-African ancestry populations for this blinding disease.


Subject(s)
Genome-Wide Association Study , Glaucoma, Open-Angle , Humans , Genetic Predisposition to Disease , Glaucoma, Open-Angle/genetics , Black People/genetics , Polymorphism, Single Nucleotide/genetics
2.
Cell ; 185(18): 3426-3440.e19, 2022 09 01.
Article in English | MEDLINE | ID: mdl-36055201

ABSTRACT

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.


Subject(s)
Genome, Human , Whole Genome Sequencing , Female , High-Throughput Nucleotide Sequencing/methods , Humans , INDEL Mutation , Male , Polymorphism, Single Nucleotide
3.
Cell ; 183(1): 197-210.e32, 2020 10 01.
Article in English | MEDLINE | ID: mdl-33007263

ABSTRACT

Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.


Subject(s)
Genomic Structural Variation/genetics , Genomics/methods , Neoplasms/genetics , Chromosome Inversion/genetics , Chromothripsis , DNA Copy Number Variations/genetics , Gene Rearrangement/genetics , Genome, Human/genetics , Humans , Mutation/genetics , Whole Genome Sequencing/methods
4.
Cell ; 171(3): 710-722.e12, 2017 Oct 19.
Article in English | MEDLINE | ID: mdl-28965761

ABSTRACT

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ∼1.5 × 10-8 SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10-3, OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10-3), suggesting a path forward for genetically characterizing more complex cases of autism.


Subject(s)
Autistic Disorder/genetics , DNA Copy Number Variations , Polymorphism, Single Nucleotide , Animals , DNA Mutational Analysis , Female , Genome-Wide Association Study , Humans , INDEL Mutation , Male , Mice
5.
Nature ; 583(7814): 83-89, 2020 07.
Article in English | MEDLINE | ID: mdl-32460305

ABSTRACT

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.


Subject(s)
Genetic Variation , Genome, Human/genetics , Whole Genome Sequencing , Alleles , Case-Control Studies , Epigenesis, Genetic , Female , Gene Dosage/genetics , Genetics, Population , High-Throughput Nucleotide Sequencing , Humans , Male , Molecular Sequence Annotation , Quantitative Trait Loci , Racial Groups/genetics , Software
6.
Am J Hum Genet ; 109(4): 631-646, 2022 04 07.
Article in English | MEDLINE | ID: mdl-35290762

ABSTRACT

Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.


Subject(s)
Genomics , High-Throughput Nucleotide Sequencing , Female , Humans , Mutation/genetics , Nucleotides , Sequence Analysis, DNA , Software
7.
Bioinformatics ; 37(13): 1918-1919, 2021 07 27.
Article in English | MEDLINE | ID: mdl-33241313

ABSTRACT

SUMMARY: We present a new version of the popular somatic variant caller, Lancet, that supports the analysis of linked-reads sequencing data. By seamlessly integrating barcodes and haplotype read assignments within the colored De Bruijn graph local-assembly framework, Lancet computes a barcode-aware coverage and identifies variants that disagree with the local haplotype structure. AVAILABILITY AND IMPLEMENTATION: Lancet is implemented in C++ and available for academic and non-commercial research purposes as an open-source package at https://github.com/nygenome/lancet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Software , Algorithms , Diploidy , Sequence Analysis, DNA
8.
Hum Genomics ; 15(1): 44, 2021 07 13.
Article in English | MEDLINE | ID: mdl-34256850

ABSTRACT

BACKGROUND: Previous research in autism and other neurodevelopmental disorders (NDDs) has indicated an important contribution of protein-coding (coding) de novo variants (DNVs) within specific genes. The role of de novo noncoding variation has been observable as a general increase in genetic burden but has yet to be resolved to individual functional elements. In this study, we assessed whole-genome sequencing data in 2671 families with autism (discovery cohort of 516 families, replication cohort of 2155 families). We focused on DNVs in enhancers with characterized in vivo activity in the brain and identified an excess of DNVs in an enhancer named hs737. RESULTS: We adapted the fitDNM statistical model to work in noncoding regions and tested enhancers for excess of DNVs in families with autism. We found only one enhancer (hs737) with nominal significance in the discovery (p = 0.0172), replication (p = 2.5 × 10-3), and combined dataset (p = 1.1 × 10-4). Each individual with a DNV in hs737 had shared phenotypes including being male, intact cognitive function, and hypotonia or motor delay. Our in vitro assessment of the DNVs showed they all reduce enhancer activity in a neuronal cell line. By epigenomic analyses, we found that hs737 is brain-specific and targets the transcription factor gene EBF3 in human fetal brain. EBF3 is genome-wide significant for coding DNVs in NDDs (missense p = 8.12 × 10-35, loss-of-function p = 2.26 × 10-13) and is widely expressed in the body. Through characterization of promoters bound by EBF3 in neuronal cells, we saw enrichment for binding to NDD genes (p = 7.43 × 10-6, OR = 1.87) involved in gene regulation. Individuals with coding DNVs have greater phenotypic severity (hypotonia, ataxia, and delayed development syndrome [HADDS]) in comparison to individuals with noncoding DNVs that have autism and hypotonia. CONCLUSIONS: In this study, we identify DNVs in the hs737 enhancer in individuals with autism. Through multiple approaches, we find hs737 targets the gene EBF3 that is genome-wide significant in NDDs. By assessment of noncoding variation and the genes they affect, we are beginning to understand their impact on gene regulatory networks in NDDs.


Subject(s)
Autistic Disorder/genetics , Genetic Predisposition to Disease , Muscle Hypotonia/genetics , Neurodevelopmental Disorders/genetics , Transcription Factors/genetics , Autistic Disorder/epidemiology , Autistic Disorder/pathology , Enhancer Elements, Genetic/genetics , Exome/genetics , Female , Gene Regulatory Networks/genetics , Humans , Male , Muscle Hypotonia/epidemiology , Muscle Hypotonia/pathology , Mutation/genetics , Neurodevelopmental Disorders/epidemiology , Neurodevelopmental Disorders/pathology , Neurons/metabolism , Neurons/pathology
9.
Genome Res ; 28(5): 751-758, 2018 05.
Article in English | MEDLINE | ID: mdl-29588360

ABSTRACT

High-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive, and fully scalable taxonomic classification tool. Using a combination of simulated and real metagenomics data sets, we demonstrate that taxMaps is more sensitive and more precise than widely used taxonomic classifiers and is capable of delivering classification accuracy comparable to that of BLASTN, but at up to three orders of magnitude less computational cost.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Metagenomics/methods , Software , Bacteria/classification , Bacteria/genetics , Databases, Nucleic Acid , Humans , Microbiota/genetics , Reproducibility of Results , Rivers/microbiology , Species Specificity , Water Microbiology
10.
Genome Res ; 28(9): 1364-1371, 2018 09.
Article in English | MEDLINE | ID: mdl-30093547

ABSTRACT

DNA methylation patterns in the genome both reflect and help to mediate transcriptional regulatory processes. The digital nature of DNA methylation, present or absent on each allele, makes this assay capable of quantifying events in subpopulations of cells, whereas genome-wide chromatin studies lack the same quantitative capacity. Testing DNA methylation throughout the genome is possible using whole-genome bisulfite sequencing (WGBS), but the high costs associated with the assay have made it impractical for studies involving more than limited numbers of samples. We have optimized a new transposase-based library preparation assay for the Illumina HiSeq X platform suitable for limited amounts of DNA and providing a major cost reduction for WGBS. By incorporating methylated cytosines during fragment end repair, we reveal an end-repair artifact affecting 1%-2% of reads that we can remove analytically. We show that the use of a high (G + C) content spike-in performs better than PhiX in terms of bisulfite sequencing quality. As expected, the loci with transposase-accessible chromatin are DNA hypomethylated and enriched in flanking regions by post-translational modifications of histones usually associated with positive effects on gene expression. Using these transposase-accessible loci to represent the cis-regulatory loci in the genome, we compared the representation of these loci between WGBS and other genome-wide DNA methylation assays, showing WGBS to outperform substantially all of the alternatives. We conclude that it is now technologically and financially feasible to perform WGBS in larger numbers of samples with greater accuracy than previously possible.


Subject(s)
Whole Genome Sequencing/methods , Base Composition , Cell Line , Costs and Cost Analysis , DNA Methylation , Histone Code , Humans , Reproducibility of Results , Sulfites/chemistry , Whole Genome Sequencing/economics , Whole Genome Sequencing/standards
11.
Am J Respir Crit Care Med ; 202(7): 962-972, 2020 10 01.
Article in English | MEDLINE | ID: mdl-32459537

ABSTRACT

Rationale: Puerto Ricans have the highest childhood asthma prevalence in the United States (23.6%); however, the etiology is uncertain.Objectives: In this study, we sought to uncover the genetic architecture of lung function in Puerto Rican youth with and without asthma who were recruited from the island (n = 836).Methods: We used admixture-mapping and whole-genome sequencing data to discover genomic regions associated with lung function. Functional roles of the prioritized candidate SNPs were examined with chromatin immunoprecipitation sequencing, RNA sequencing, and expression quantitative trait loci data.Measurements and Main Results: We discovered a genomic region at 1q32 that was significantly associated with a 0.12-L decrease in the lung volume of exhaled air (95% confidence interval, -0.17 to -0.07; P = 6.62 × 10-8) with each allele of African ancestry. Within this region, two SNPs were expression quantitative trait loci of TMEM9 in nasal airway epithelial cells and MROH3P in esophagus mucosa. The minor alleles of these SNPs were associated with significantly decreased lung function and decreased TMEM9 gene expression. Another admixture-mapping peak was observed on chromosome 5q35.1, indicating that each Native American ancestry allele was associated with a 0.15-L increase in lung function (95% confidence interval, 0.08-0.21; P = 5.03 × 10-6). The region-based association tests identified four suggestive windows that harbored candidate rare variants associated with lung function.Conclusions: We identified common and rare genetic variants that may play a critical role in lung function among Puerto Rican youth. We independently validated an inflammatory pathway that could potentially be used to develop more targeted treatments and interventions for patients with asthma.


Subject(s)
Asthma/genetics , Black People/genetics , Chromosomes, Human, Pair 1/genetics , Chromosomes, Human, Pair 5/genetics , Forced Expiratory Volume/genetics , Indians, North American/genetics , Lung/physiopathology , Adolescent , Asthma/physiopathology , Bronchi/cytology , Case-Control Studies , Cell Line , Child , Chromatin Immunoprecipitation , Chromosome Mapping , Esophageal Mucosa/metabolism , Female , Gene Expression , Humans , Linkage Disequilibrium , Lung/physiology , Male , Membrane Proteins/genetics , Membrane Proteins/metabolism , Myocytes, Smooth Muscle , Nasal Mucosa/metabolism , Polymorphism, Single Nucleotide , Puerto Rico , Quantitative Trait Loci , Sequence Analysis, RNA , White People/genetics , Whole Genome Sequencing , Young Adult
12.
Am J Hum Genet ; 98(1): 58-74, 2016 Jan 07.
Article in English | MEDLINE | ID: mdl-26749308

ABSTRACT

We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.


Subject(s)
Autistic Disorder/genetics , DNA/genetics , Genome, Human , Exome , Female , Humans , Male , Pedigree , Polymorphism, Single Nucleotide
13.
Genet Med ; 21(7): 1611-1620, 2019 07.
Article in English | MEDLINE | ID: mdl-30504930

ABSTRACT

PURPOSE: To maximize the discovery of potentially pathogenic variants to better understand the diagnostic utility of genome sequencing (GS) and to assess how the presence of multiple risk events might affect the phenotypic severity in autism spectrum disorders (ASD). METHODS: GS was applied to 180 simplex and multiplex ASD families (578 individuals, 213 patients) with exome sequencing and array comparative genomic hybridization further applied to a subset for validation and cross-platform comparisons. RESULTS: We found that 40.8% of patients carried variants with evidence of disease risk, including a de novo frameshift variant in NR4A2 and two de novo missense variants in SYNCRIP, while 21.1% carried clinically relevant pathogenic or likely pathogenic variants. Patients with more than one risk variant (9.9%) were more severely affected with respect to cognitive ability compared with patients with a single or no-risk variant. We observed no instance among the 27 multiplex families where a pathogenic or likely pathogenic variant was transmitted to all affected members in the family. CONCLUSION: The study demonstrates the diagnostic utility of GS, especially for multiple risk variants that contribute to the phenotypic severity, shows the genetic heterogeneity in multiplex families, and provides evidence for new genes for follow up.


Subject(s)
Autistic Disorder/genetics , Exome Sequencing , Child , Comparative Genomic Hybridization , DNA Copy Number Variations , DNA Mutational Analysis , Female , Humans , Male , Phenotype
14.
Nature ; 484(7392): 55-61, 2012 Apr 04.
Article in English | MEDLINE | ID: mdl-22481358

ABSTRACT

Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.


Subject(s)
Adaptation, Physiological/genetics , Biological Evolution , Genome/genetics , Smegmamorpha/genetics , Alaska , Animals , Aquatic Organisms/genetics , Chromosome Inversion/genetics , Chromosomes/genetics , Conserved Sequence/genetics , Ecotype , Female , Fresh Water , Genetic Variation/genetics , Genomics , Molecular Sequence Data , Seawater , Sequence Analysis, DNA
15.
J Virol ; 90(2): 862-72, 2016 01 15.
Article in English | MEDLINE | ID: mdl-26512086

ABSTRACT

UNLABELLED: The introduction of West Nile virus (WNV) into North America in 1999 is a classic example of viral emergence in a new environment, with its subsequent dispersion across the continent having a major impact on local bird populations. Despite the importance of this epizootic, the pattern, dynamics, and determinants of WNV spread in its natural hosts remain uncertain. In particular, it is unclear whether the virus encountered major barriers to transmission, or spread in an unconstrained manner, and if specific viral lineages were favored over others indicative of intrinsic differences in fitness. To address these key questions in WNV evolution and ecology, we sequenced the complete genomes of approximately 300 avian isolates sampled across the United States between 2001 and 2012. Phylogenetic analysis revealed a relatively star-like tree structure, indicative of explosive viral spread in the United States, although with some replacement of viral genotypes through time. These data are striking in that viral sequences exhibit relatively limited clustering according to geographic region, particularly for those viruses sampled from birds, and no strong phylogenetic association with well-sampled avian species. The genome sequence data analyzed here also contain relatively little evidence for adaptive evolution, particularly of structural proteins, suggesting that most viral lineages are of similar fitness and that WNV is well adapted to the ecology of mosquito vectors and diverse avian hosts in the United States. In sum, the molecular evolution of WNV in North America depicts a largely unfettered expansion within a permissive host and geographic population with little evidence of major adaptive barriers. IMPORTANCE: How viruses spread in new host and geographic environments is central to understanding the emergence and evolution of novel infectious diseases and for predicting their likely impact. The emergence of the vector-borne West Nile virus (WNV) in North America in 1999 represents a classic example of this process. Using approximately 300 new viral genomes sampled from wild birds, we show that WNV experienced an explosive spread with little geographical or host constraints within birds and relatively low levels of adaptive evolution. From its introduction into the state of New York, WNV spread across the United States, reaching California and Florida within 4 years, a migration that is clearly reflected in our genomic sequence data, and with a general absence of distinct geographical clusters of bird viruses. However, some geographically distinct viral lineages were found to circulate in mosquitoes, likely reflecting their limited long-distance movement compared to avian species.


Subject(s)
Bird Diseases/epidemiology , Bird Diseases/transmission , Disease Transmission, Infectious , Phylogeography , West Nile Fever/veterinary , Animals , Bird Diseases/virology , Cluster Analysis , Evolution, Molecular , Genetic Variation , Genome, Viral , Genotype , Molecular Epidemiology , Molecular Sequence Data , Sequence Analysis, DNA , Sequence Homology , United States/epidemiology , West Nile Fever/epidemiology , West Nile Fever/transmission , West Nile virus/classification , West Nile virus/genetics , West Nile virus/isolation & purification
16.
Bioinformatics ; 32(20): 3196-3198, 2016 10 15.
Article in English | MEDLINE | ID: mdl-27354699

ABSTRACT

MOTIVATION: Sequencing of matched tumor and normal samples is the standard study design for reliable detection of somatic alterations. However, even very low levels of cross-sample contamination significantly impact calling of somatic mutations, because contaminant germline variants can be incorrectly interpreted as somatic. There are currently no sequence-only based methods that reliably estimate contamination levels in tumor samples, which frequently display copy number changes. As a solution, we developed Conpair, a tool for detection of sample swaps and cross-individual contamination in whole-genome and whole-exome tumor-normal sequencing experiments. RESULTS: On a ladder of in silico contaminated samples, we demonstrated that Conpair reliably measures contamination levels as low as 0.1%, even in presence of copy number changes. We also estimated contamination levels in glioblastoma WGS and WXS tumor-normal datasets from TCGA and showed that they strongly correlate with tumor-normal concordance, as well as with the number of germline variants called as somatic by several widely-used somatic callers. AVAILABILITY AND IMPLEMENTATION: The method is available at: https://github.com/nygenome/conpair CONTACT: egrabowska@gmail.com or mczody@nygenome.orgSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computer Simulation , DNA, Neoplasm , Neoplasms , Algorithms , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/pathology
17.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA
18.
Nat Genet ; 40(9): 1076-83, 2008 Sep.
Article in English | MEDLINE | ID: mdl-19165922

ABSTRACT

Using comparative sequencing approaches, we investigated the evolutionary history of the European-enriched 17q21.31 MAPT inversion polymorphism. We present a detailed, BAC-based sequence assembly of the inverted human H2 haplotype and compare it to the sequence structure and genetic variation of the corresponding 1.5-Mb region for the noninverted H1 human haplotype and that of chimpanzee and orangutan. We found that inversion of the MAPT region is similarly polymorphic in other great ape species, and we present evidence that the inversions occurred independently in chimpanzees and humans. In humans, the inversion breakpoints correspond to core duplications with the LRRC37 gene family. Our analysis favors the H2 configuration and sequence haplotype as the likely great ape and human ancestral state, with inversion recurrences during primate evolution. We show that the H2 architecture has evolved more extensive sequence homology, perhaps explaining its tendency to undergo microdeletion associated with mental retardation in European populations.


Subject(s)
Chromosome Inversion , Chromosomes, Human, Pair 17 , Evolution, Molecular , Polymorphism, Genetic , tau Proteins/genetics , Animals , Base Sequence , Gene Duplication , Humans , Models, Biological , Molecular Sequence Data , Pan troglodytes/genetics , Phylogeny , Pongo pygmaeus/genetics , Sequence Analysis, DNA
19.
Nat Genet ; 40(9): 1107-12, 2008 Sep.
Article in English | MEDLINE | ID: mdl-19165925

ABSTRACT

Following recent success in genome-wide association studies, a critical focus of human genetics is to understand how genetic variation at implicated loci influences cellular and disease processes. Crohn's disease (CD) is associated with SNPs around IRGM, but coding-sequence variation has been excluded as a source of this association. We identified a common, 20-kb deletion polymorphism, immediately upstream of IRGM and in perfect linkage disequilibrium (r2 = 1.0) with the most strongly CD-associated SNP, that causes IRGM to segregate in the population with two distinct upstream sequences. The deletion (CD risk) and reference (CD protective) haplotypes of IRGM showed distinct expression patterns. Manipulation of IRGM expression levels modulated cellular autophagy of internalized bacteria, a process implicated in CD. These results suggest that the CD association at IRGM arises from an alteration in IRGM regulation that affects the efficacy of autophagy and identify a common deletion polymorphism as a likely causal variant.


Subject(s)
Crohn Disease/genetics , GTP-Binding Proteins/genetics , Polymorphism, Single Nucleotide , Autophagy/genetics , Cell Line , Gene Expression Regulation , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Sequence Deletion
20.
Nature ; 464(7288): 587-91, 2010 Mar 25.
Article in English | MEDLINE | ID: mdl-20220755

ABSTRACT

Domestic animals are excellent models for genetic studies of phenotypic evolution. They have evolved genetic adaptations to a new environment, the farm, and have been subjected to strong human-driven selection leading to remarkable phenotypic changes in morphology, physiology and behaviour. Identifying the genetic changes underlying these developments provides new insight into general mechanisms by which genetic variation shapes phenotypic diversity. Here we describe the use of massively parallel sequencing to identify selective sweeps of favourable alleles and candidate mutations that have had a prominent role in the domestication of chickens (Gallus gallus domesticus) and their subsequent specialization into broiler (meat-producing) and layer (egg-producing) chickens. We have generated 44.5-fold coverage of the chicken genome using pools of genomic DNA representing eight different populations of domestic chickens as well as red jungle fowl (Gallus gallus), the major wild ancestor. We report more than 7,000,000 single nucleotide polymorphisms, almost 1,300 deletions and a number of putative selective sweeps. One of the most striking selective sweeps found in all domestic chickens occurred at the locus for thyroid stimulating hormone receptor (TSHR), which has a pivotal role in metabolic regulation and photoperiod control of reproduction in vertebrates. Several of the selective sweeps detected in broilers overlapped genes associated with growth, appetite and metabolic regulation. We found little evidence that selection for loss-of-function mutations had a prominent role in chicken domestication, but we detected two deletions in coding sequences that we suggest are functionally important. This study has direct application to animal breeding and enhances the importance of the domestic chicken as a model organism for biomedical research.


Subject(s)
Chickens/genetics , Genetic Loci/genetics , Genome/genetics , Selection, Genetic/genetics , Amino Acid Sequence , Animals , Biological Evolution , Female , Male , Molecular Sequence Data , Polymorphism, Single Nucleotide , Sequence Alignment , Sequence Analysis, DNA , Sequence Deletion
SELECTION OF CITATIONS
SEARCH DETAIL