Search | VHL Search Portal

1.

A cross-disorder dosage sensitivity map of the human genome.

Collins, Ryan L; Glessner, Joseph T; Porcu, Eleonora; Lepamets, Maarja; Brandon, Rhonda; Lauricella, Christopher; Han, Lide; Morley, Theodore; Niestroj, Lisa-Marie; Ulirsch, Jacob; Everett, Selin; Howrigan, Daniel P; Boone, Philip M; Fu, Jack; Karczewski, Konrad J; Kellaris, Georgios; Lowther, Chelsea; Lucente, Diane; Mohajeri, Kiana; Nõukas, Margit; Nuttle, Xander; Samocha, Kaitlin E; Trinh, Mi; Ullah, Farid; Võsa, Urmo; Hurles, Matthew E; Aradhya, Swaroop; Davis, Erica E; Finucane, Hilary; Gusella, James F; Janze, Aura; Katsanis, Nicholas; Matyakhina, Ludmila; Neale, Benjamin M; Sanders, David; Warren, Stephanie; Hodge, Jennelle C; Lal, Dennis; Ruderfer, Douglas M; Meck, Jeanne; Mägi, Reedik; Esko, Tõnu; Reymond, Alexandre; Kutalik, Zoltán; Hakonarson, Hakon; Sunyaev, Shamil; Brand, Harrison; Talkowski, Michael E.

Cell ; 185(16): 3041-3055.e25, 2022 08 04.

Article in English | MEDLINE | ID: mdl-35917817

ABSTRACT

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.

Subject(s)

DNA Copy Number Variations , Genome, Human , DNA Copy Number Variations/genetics , Gene Dosage , Haploinsufficiency/genetics , Humans

2.

A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 625(7993): 92-100, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.

Subject(s)

Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic

3.

Polygenic architecture of rare coding variation across 394,783 exomes.

Weiner, Daniel J; Nadig, Ajay; Jagadeesh, Karthik A; Dey, Kushal K; Neale, Benjamin M; Robinson, Elise B; Karczewski, Konrad J; O'Connor, Luke J.

Nature ; 614(7948): 492-499, 2023 02.

Article in English | MEDLINE | ID: mdl-36755099

ABSTRACT

Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes1-3. However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear4. Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes5. Rare coding variants (allele frequency < 1 × 10-3) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average-much less than common variants-and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10-5). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder6,7 is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification.

Subject(s)

Exome , Gene Frequency , Genetic Variation , Multifactorial Inheritance , Humans , Exome/genetics , Genetic Variation/genetics , Genome-Wide Association Study , Multifactorial Inheritance/genetics , Risk Factors , United Kingdom , Genetic Loci/genetics , Schizophrenia/genetics , Bipolar Disorder/genetics

4.

Nuclear genetic control of mtDNA copy number and heteroplasmy in humans.

Gupta, Rahul; Kanai, Masahiro; Durham, Timothy J; Tsuo, Kristin; McCoy, Jason G; Kotrys, Anna V; Zhou, Wei; Chinnery, Patrick F; Karczewski, Konrad J; Calvo, Sarah E; Neale, Benjamin M; Mootha, Vamsi K.

Nature ; 620(7975): 839-848, 2023 Aug.

Article in English | MEDLINE | ID: mdl-37587338

ABSTRACT

Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation1. Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching2,3. We find that this variant exerts cis-acting genetic control over mtDNA abundance and is itself associated in-trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.

Subject(s)

Cell Nucleus , DNA Copy Number Variations , DNA, Mitochondrial , Heteroplasmy , Mitochondria , Aged , Humans , DNA Copy Number Variations/genetics , DNA, Mitochondrial/genetics , Genome-Wide Association Study , Heteroplasmy/genetics , Mitochondria/genetics , Cell Nucleus/genetics , Alleles , Polymorphism, Single Nucleotide , INDEL Mutation , G-Quadruplexes

5.

Rare coding variants in ten genes confer substantial risk for schizophrenia.

Singh, Tarjinder; Poterba, Timothy; Curtis, David; Akil, Huda; Al Eissa, Mariam; Barchas, Jack D; Bass, Nicholas; Bigdeli, Tim B; Breen, Gerome; Bromet, Evelyn J; Buckley, Peter F; Bunney, William E; Bybjerg-Grauholm, Jonas; Byerley, William F; Chapman, Sinéad B; Chen, Wei J; Churchhouse, Claire; Craddock, Nicholas; Cusick, Caroline M; DeLisi, Lynn; Dodge, Sheila; Escamilla, Michael A; Eskelinen, Saana; Fanous, Ayman H; Faraone, Stephen V; Fiorentino, Alessia; Francioli, Laurent; Gabriel, Stacey B; Gage, Diane; Gagliano Taliun, Sarah A; Ganna, Andrea; Genovese, Giulio; Glahn, David C; Grove, Jakob; Hall, Mei-Hua; Hämäläinen, Eija; Heyne, Henrike O; Holi, Matti; Hougaard, David M; Howrigan, Daniel P; Huang, Hailiang; Hwu, Hai-Gwo; Kahn, René S; Kang, Hyun Min; Karczewski, Konrad J; Kirov, George; Knowles, James A; Lee, Francis S; Lehrer, Douglas S; Lescai, Francesco.

Nature ; 604(7906): 509-516, 2022 04.

Article in English | MEDLINE | ID: mdl-35396579

ABSTRACT

Rare coding variation has historically provided the most direct connections between gene function and disease pathogenesis. By meta-analysing the whole exomes of 24,248 schizophrenia cases and 97,322 controls, we implicate ultra-rare coding variants (URVs) in 10 genes as conferring substantial risk for schizophrenia (odds ratios of 3-50, P < 2.14 × 10-6) and 32 genes at a false discovery rate of <5%. These genes have the greatest expression in central nervous system neurons and have diverse molecular functions that include the formation, structure and function of the synapse. The associations of the NMDA (N-methyl-D-aspartate) receptor subunit GRIN2A and AMPA (α-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid) receptor subunit GRIA3 provide support for dysfunction of the glutamatergic system as a mechanistic hypothesis in the pathogenesis of schizophrenia. We observe an overlap of rare variant risk among schizophrenia, autism spectrum disorders1, epilepsy and severe neurodevelopmental disorders2, although different mutation types are implicated in some shared genes. Most genes described here, however, are not implicated in neurodevelopment. We demonstrate that genes prioritized from common variant analyses of schizophrenia are enriched in rare variant risk3, suggesting that common and rare genetic risk factors converge at least partially on the same underlying pathogenic biological processes. Even after excluding significantly associated genes, schizophrenia cases still carry a substantial excess of URVs, which indicates that more risk genes await discovery using this approach.

Subject(s)

Mutation , Neurodevelopmental Disorders , Schizophrenia , Case-Control Studies , Exome , Genetic Predisposition to Disease/genetics , Humans , Neurodevelopmental Disorders/genetics , Receptors, N-Methyl-D-Aspartate/genetics , Schizophrenia/genetics

6.

A harmonized public resource of deeply sequenced diverse human genomes.

Koenig, Zan; Yohannes, Mary T; Nkambule, Lethukuthula L; Zhao, Xuefang; Goodrich, Julia K; Kim, Heesu Ally; Wilson, Michael W; Tiao, Grace; Hao, Stephanie P; Sahakian, Nareh; Chao, Katherine R; Walker, Mark A; Lyu, Yunfei; Rehm, Heidi L; Neale, Benjamin M; Talkowski, Michael E; Daly, Mark J; Brand, Harrison; Karczewski, Konrad J; Atkinson, Elizabeth G; Martin, Alicia R.

Genome Res ; 34(5): 796-809, 2024 06 25.

Article in English | MEDLINE | ID: mdl-38749656

ABSTRACT

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

Subject(s)

Databases, Genetic , Genome, Human , Humans , Human Genome Project , High-Throughput Nucleotide Sequencing/methods , Genetic Variation , Genomics/methods

7.

Personal omics profiling reveals dynamic molecular and medical phenotypes.

Chen, Rui; Mias, George I; Li-Pook-Than, Jennifer; Jiang, Lihua; Lam, Hugo Y K; Chen, Rong; Miriami, Elana; Karczewski, Konrad J; Hariharan, Manoj; Dewey, Frederick E; Cheng, Yong; Clark, Michael J; Im, Hogune; Habegger, Lukas; Balasubramanian, Suganthi; O'Huallachain, Maeve; Dudley, Joel T; Hillenmeyer, Sara; Haraksingh, Rajini; Sharon, Donald; Euskirchen, Ghia; Lacroute, Phil; Bettinger, Keith; Boyle, Alan P; Kasowski, Maya; Grubert, Fabian; Seki, Scott; Garcia, Marco; Whirl-Carrillo, Michelle; Gallardo, Mercedes; Blasco, Maria A; Greenberg, Peter L; Snyder, Phyllis; Klein, Teri E; Altman, Russ B; Butte, Atul J; Ashley, Euan A; Gerstein, Mark; Nadeau, Kari C; Tang, Hua; Snyder, Michael.

Cell ; 148(6): 1293-307, 2012 Mar 16.

Article in English | MEDLINE | ID: mdl-22424236

ABSTRACT

Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.

Subject(s)

Genome, Human , Genomics , Precision Medicine , Diabetes Mellitus, Type 2/genetics , Female , Gene Expression Profiling , Humans , Male , Metabolomics , Middle Aged , Mutation , Proteomics , Respiratory Syncytial Viruses/isolation & purification , Rhinovirus/isolation & purification

8.

CHARR efficiently estimates contamination from DNA sequencing data.

Lu, Wenhan; Gauthier, Laura D; Poterba, Timothy; Giacopuzzi, Edoardo; Goodrich, Julia K; Stevens, Christine R; King, Daniel; Daly, Mark J; Neale, Benjamin M; Karczewski, Konrad J.

Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.

Article in English | MEDLINE | ID: mdl-38000370

ABSTRACT

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.

Subject(s)

DNA , Trout , Humans , Animals , Sequence Analysis, DNA/methods , Genotype , Homozygote , High-Throughput Nucleotide Sequencing/methods , Software

9.

Advanced variant classification framework reduces the false positive rate of predicted loss-of-function variants in population sequencing data.

Singer-Berk, Moriel; Gudmundsson, Sanna; Baxter, Samantha; Seaby, Eleanor G; England, Eleina; Wood, Jordan C; Son, Rachel G; Watts, Nicholas A; Karczewski, Konrad J; Harrison, Steven M; MacArthur, Daniel G; Rehm, Heidi L; O'Donnell-Luria, Anne.

Am J Hum Genet ; 110(9): 1496-1508, 2023 09 07.

Article in English | MEDLINE | ID: mdl-37633279

ABSTRACT

Predicted loss of function (pLoF) variants are often highly deleterious and play an important role in disease biology, but many pLoF variants may not result in loss of function (LoF). Here we present a framework that advances interpretation of pLoF variants in research and clinical settings by considering three categories of LoF evasion: (1) predicted rescue by secondary sequence properties, (2) uncertain biological relevance, and (3) potential technical artifacts. We also provide recommendations on adjustments to ACMG/AMP guidelines' PVS1 criterion. Applying this framework to all high-confidence pLoF variants in 22 genes associated with autosomal-recessive disease from the Genome Aggregation Database (gnomAD v.2.1.1) revealed predicted LoF evasion or potential artifacts in 27.3% (304/1,113) of variants. The major reasons were location in the last exon, in a homopolymer repeat, in a low proportion expressed across transcripts (pext) scored region, or the presence of cryptic in-frame splice rescues. Variants predicted to evade LoF or to be potential artifacts were enriched for ClinVar benign variants. PVS1 was downgraded in 99.4% (162/163) of pLoF variants predicted as likely not LoF/not LoF, with 17.2% (28/163) downgraded as a result of our framework, adding to previous guidelines. Variant pathogenicity was affected (mostly from likely pathogenic to VUS) in 20 (71.4%) of these 28 variants. This framework guides assessment of pLoF variants beyond standard annotation pipelines and substantially reduces false positive rates, which is key to ensure accurate LoF variant prediction in both a research and clinical setting.

Subject(s)

Inheritance Patterns , Humans , Exons , Uncertainty

10.

Discordant calls across genotype discovery approaches elucidate variants with systematic errors.

Atkinson, Elizabeth G; Artomov, Mykyta; Loboda, Alexander A; Rehm, Heidi L; MacArthur, Daniel G; Karczewski, Konrad J; Neale, Benjamin M; Daly, Mark J.

Genome Res ; 33(6): 999-1005, 2023 06.

Article in English | MEDLINE | ID: mdl-37253541

ABSTRACT

Large-scale high-throughput sequencing data sets have been transformative for informing clinical variant interpretation and for use as reference panels for statistical and population genetic efforts. Although such resources are often treated as ground truth, we find that in widely used reference data sets such as the Genome Aggregation Database (gnomAD), some variants pass gold-standard filters, yet are systematically different in their genotype calls across genotype discovery approaches. The inclusion of such discordant sites in study designs involving multiple genotype discovery strategies could bias results and lead to false-positive hits in association studies owing to technological artifacts rather than a true relationship to the phenotype. Here, we describe this phenomenon of discordant genotype calls across genotype discovery approaches, characterize the error mode of wrong calls, provide a list of discordant sites identified in gnomAD that should be treated with caution in analyses, and present a metric and machine learning classifier trained on gnomAD data to identify likely discordant variants in other data sets. We find that different genotype discovery approaches have different sets of variants at which this problem occurs, but there are characteristic variant features that can be used to predict discordant behavior. Discordant sites are largely shared across ancestry groups, although different populations are powered for the discovery of different variants. We find that the most common error mode is that of a variant being heterozygous for one approach and homozygous for the other, with heterozygous in the genomes and homozygous reference in the exomes making up the majority of miscalls.

Subject(s)

Exome , Genetics, Population , Genotype , Heterozygote , Phenotype , Polymorphism, Single Nucleotide

11.

Evaluating drug targets through human loss-of-function genetic variation.

Minikel, Eric Vallabh; Karczewski, Konrad J; Martin, Hilary C; Cummings, Beryl B; Whiffin, Nicola; Rhodes, Daniel; Alföldi, Jessica; Trembath, Richard C; van Heel, David A; Daly, Mark J; Schreiber, Stuart L; MacArthur, Daniel G.

Nature ; 581(7809): 459-464, 2020 05.

Article in English | MEDLINE | ID: mdl-32461653

ABSTRACT

Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.

Subject(s)

Genes, Essential/drug effects , Genes, Essential/genetics , Loss of Function Mutation/genetics , Molecular Targeted Therapy , Artifacts , Automation , Consanguinity , Exons/genetics , Gain of Function Mutation/genetics , Gene Frequency , Gene Knockdown Techniques , Heterozygote , Homozygote , Humans , Huntingtin Protein/genetics , Leucine-Rich Repeat Serine-Threonine Protein Kinase-2/genetics , Neurodegenerative Diseases/genetics , Prion Proteins/genetics , Reproducibility of Results , Sample Size , tau Proteins/genetics

12.

Transcript expression-aware annotation improves rare variant interpretation.

Cummings, Beryl B; Karczewski, Konrad J; Kosmicki, Jack A; Seaby, Eleanor G; Watts, Nicholas A; Singer-Berk, Moriel; Mudge, Jonathan M; Karjalainen, Juha; Satterstrom, F Kyle; O'Donnell-Luria, Anne H; Poterba, Timothy; Seed, Cotton; Solomonson, Matthew; Alföldi, Jessica; Daly, Mark J; MacArthur, Daniel G.

Nature ; 581(7809): 452-458, 2020 05.

Article in English | MEDLINE | ID: mdl-32461655

ABSTRACT

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.

Subject(s)

Disease/genetics , Haploinsufficiency/genetics , Loss of Function Mutation/genetics , Molecular Sequence Annotation , Transcription, Genetic , Transcriptome/genetics , Autism Spectrum Disorder/genetics , Datasets as Topic , Developmental Disabilities/genetics , Exons/genetics , Female , Genotype , Humans , Intellectual Disability/genetics , Male , Molecular Sequence Annotation/standards , Poisson Distribution , RNA, Messenger/analysis , RNA, Messenger/genetics , Rare Diseases/diagnosis , Rare Diseases/genetics , Reproducibility of Results , Exome Sequencing

13.

A structural variation reference for medical and population genetics.

Collins, Ryan L; Brand, Harrison; Karczewski, Konrad J; Zhao, Xuefang; Alföldi, Jessica; Francioli, Laurent C; Khera, Amit V; Lowther, Chelsea; Gauthier, Laura D; Wang, Harold; Watts, Nicholas A; Solomonson, Matthew; O'Donnell-Luria, Anne; Baumann, Alexander; Munshi, Ruchi; Walker, Mark; Whelan, Christopher W; Huang, Yongqing; Brookings, Ted; Sharpe, Ted; Stone, Matthew R; Valkanas, Elise; Fu, Jack; Tiao, Grace; Laricchia, Kristen M; Ruano-Rubio, Valentin; Stevens, Christine; Gupta, Namrata; Cusick, Caroline; Margolin, Lauren; Taylor, Kent D; Lin, Henry J; Rich, Stephen S; Post, Wendy S; Chen, Yii-Der Ida; Rotter, Jerome I; Nusbaum, Chad; Philippakis, Anthony; Lander, Eric; Gabriel, Stacey; Neale, Benjamin M; Kathiresan, Sekar; Daly, Mark J; Banks, Eric; MacArthur, Daniel G; Talkowski, Michael E.

Nature ; 581(7809): 444-451, 2020 05.

Article in English | MEDLINE | ID: mdl-32461652

ABSTRACT

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.

Subject(s)

Disease/genetics , Genetic Variation , Genetics, Medical/standards , Genetics, Population/standards , Genome, Human/genetics , Female , Genetic Testing , Genotyping Techniques , Humans , Male , Middle Aged , Mutation , Polymorphism, Single Nucleotide/genetics , Racial Groups/genetics , Reference Standards , Selection, Genetic , Whole Genome Sequencing

14.

The mutational constraint spectrum quantified from variation in 141,456 humans.

Karczewski, Konrad J; Francioli, Laurent C; Tiao, Grace; Cummings, Beryl B; Alföldi, Jessica; Wang, Qingbo; Collins, Ryan L; Laricchia, Kristen M; Ganna, Andrea; Birnbaum, Daniel P; Gauthier, Laura D; Brand, Harrison; Solomonson, Matthew; Watts, Nicholas A; Rhodes, Daniel; Singer-Berk, Moriel; England, Eleina M; Seaby, Eleanor G; Kosmicki, Jack A; Walters, Raymond K; Tashman, Katherine; Farjoun, Yossi; Banks, Eric; Poterba, Timothy; Wang, Arcturus; Seed, Cotton; Whiffin, Nicola; Chong, Jessica X; Samocha, Kaitlin E; Pierce-Hoffman, Emma; Zappala, Zachary; O'Donnell-Luria, Anne H; Minikel, Eric Vallabh; Weisburd, Ben; Lek, Monkol; Ware, James S; Vittal, Christopher; Armean, Irina M; Bergelson, Louis; Cibulskis, Kristian; Connolly, Kristen M; Covarrubias, Miguel; Donnelly, Stacey; Ferriera, Steven; Gabriel, Stacey; Gentry, Jeff; Gupta, Namrata; Jeandet, Thibault; Kaplan, Diane; Llanwarne, Christopher.

Nature ; 581(7809): 434-443, 2020 05.

Article in English | MEDLINE | ID: mdl-32461654

ABSTRACT

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.

Subject(s)

Exome/genetics , Genes, Essential/genetics , Genetic Variation/genetics , Genome, Human/genetics , Adult , Brain/metabolism , Cardiovascular Diseases/genetics , Cohort Studies , Databases, Genetic , Female , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Humans , Loss of Function Mutation/genetics , Male , Mutation Rate , Proprotein Convertase 9/genetics , RNA, Messenger/genetics , Reproducibility of Results , Exome Sequencing , Whole Genome Sequencing

15.

Author Correction: Nuclear genetic control of mtDNA copy number and heteroplasmy in humans.

Gupta, Rahul; Kanai, Masahiro; Durham, Timothy J; Tsuo, Kristin; McCoy, Jason G; Kotrys, Anna V; Zhou, Wei; Chinnery, Patrick F; Karczewski, Konrad J; Calvo, Sarah E; Neale, Benjamin M; Mootha, Vamsi K.

Nature ; 630(8017): E10, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38831054

16.

Author Correction: A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 626(7997): E1, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38225470

17.

Exome-wide association study to identify rare variants influencing COVID-19 outcomes: Results from the Host Genetics Initiative.

Butler-Laporte, Guillaume; Povysil, Gundula; Kosmicki, Jack A; Cirulli, Elizabeth T; Drivas, Theodore; Furini, Simone; Saad, Chadi; Schmidt, Axel; Olszewski, Pawel; Korotko, Urszula; Quinodoz, Mathieu; Çelik, Elifnaz; Kundu, Kousik; Walter, Klaudia; Jung, Junghyun; Stockwell, Amy D; Sloofman, Laura G; Jordan, Daniel M; Thompson, Ryan C; Del Valle, Diane; Simons, Nicole; Cheng, Esther; Sebra, Robert; Schadt, Eric E; Kim-Schulze, Seunghee; Gnjatic, Sacha; Merad, Miriam; Buxbaum, Joseph D; Beckmann, Noam D; Charney, Alexander W; Przychodzen, Bartlomiej; Chang, Timothy; Pottinger, Tess D; Shang, Ning; Brand, Fabian; Fava, Francesca; Mari, Francesca; Chwialkowska, Karolina; Niemira, Magdalena; Pula, Szymon; Baillie, J Kenneth; Stuckey, Alex; Salas, Antonio; Bello, Xabier; Pardo-Seco, Jacobo; Gómez-Carballa, Alberto; Rivero-Calle, Irene; Martinón-Torres, Federico; Ganna, Andrea; Karczewski, Konrad J.

PLoS Genet ; 18(11): e1010367, 2022 11.

Article in English | MEDLINE | ID: mdl-36327219

ABSTRACT

Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,085 severe disease cases and 571,737 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor TLR7 (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75-10.05, p = 5.41x10-7). This association was consistent across sexes. These results further support TLR7 as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights.

Subject(s)

COVID-19 , Exome , Humans , Exome/genetics , Genome-Wide Association Study , COVID-19/genetics , Genetic Predisposition to Disease , Toll-Like Receptor 7/genetics , SARS-CoV-2/genetics

18.

Non-coding region variants upstream of MEF2C cause severe developmental disorder through three distinct loss-of-function mechanisms.

Wright, Caroline F; Quaife, Nicholas M; Ramos-Hernández, Laura; Danecek, Petr; Ferla, Matteo P; Samocha, Kaitlin E; Kaplanis, Joanna; Gardner, Eugene J; Eberhardt, Ruth Y; Chao, Katherine R; Karczewski, Konrad J; Morales, Joannella; Gallone, Giuseppe; Balasubramanian, Meena; Banka, Siddharth; Gompertz, Lianne; Kerr, Bronwyn; Kirby, Amelia; Lynch, Sally A; Morton, Jenny E V; Pinz, Hailey; Sansbury, Francis H; Stewart, Helen; Zuccarelli, Britton D; Cook, Stuart A; Taylor, Jenny C; Juusola, Jane; Retterer, Kyle; Firth, Helen V; Hurles, Matthew E; Lara-Pezzi, Enrique; Barton, Paul J R; Whiffin, Nicola.

Am J Hum Genet ; 108(6): 1083-1094, 2021 06 03.

Article in English | MEDLINE | ID: mdl-34022131

ABSTRACT

Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.

Subject(s)

5' Untranslated Regions , Developmental Disabilities/etiology , Genetic Predisposition to Disease , Loss of Function Mutation , Child , Cohort Studies , DNA Copy Number Variations , Developmental Disabilities/pathology , Humans , MEF2 Transcription Factors/genetics , Exome Sequencing

19.

Integrative omics for health and disease.

Karczewski, Konrad J; Snyder, Michael P.

Nat Rev Genet ; 19(5): 299-310, 2018 05.

Article in English | MEDLINE | ID: mdl-29479082

ABSTRACT

Advances in omics technologies - such as genomics, transcriptomics, proteomics and metabolomics - have begun to enable personalized medicine at an extraordinarily detailed molecular level. Individually, these technologies have contributed medical advances that have begun to enter clinical practice. However, each technology individually cannot capture the entire biological complexity of most human diseases. Integration of multiple technologies has emerged as an approach to provide a more comprehensive view of biology and disease. In this Review, we discuss the potential for combining diverse types of data and the utility of this approach in human health and disease. We provide examples of data integration to understand, diagnose and inform treatment of diseases, including rare and common diseases as well as cancer and transplant biology. Finally, we discuss technical and other challenges to clinical implementation of integrative omics.

Subject(s)

Metabolomics/methods , Precision Medicine/methods , Proteomics/methods , Humans

20.

Corrigendum: Landscape of X chromosome inactivation across human tissues.

Tukiainen, Taru; Villani, Alexandra-Chloé; Yen, Angela; Rivas, Manuel A; Marshall, Jamie L; Satija, Rahul; Aguirre, Matt; Gauthier, Laura; Fleharty, Mark; Kirby, Andrew; Cummings, Beryl B; Castel, Stephane E; Karczewski, Konrad J; Aguet, François; Byrnes, Andrea; Consortium, GTEx; Lappalainen, Tuuli; Regev, Aviv; Ardlie, Kristin G; Hacohen, Nir; MacArthur, Daniel G.

Nature ; 555(7695): 274, 2018 03 07.

Article in English | MEDLINE | ID: mdl-29517003

ABSTRACT

This corrects the article DOI: 10.1038/nature24265.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL