Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 544
Filter
Add more filters

Publication year range
1.
Cell ; 180(3): 568-584.e23, 2020 02 06.
Article in English | MEDLINE | ID: mdl-31981491

ABSTRACT

We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n = 35,584 total samples, 11,986 with ASD). Using an enhanced analytical framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate of 0.1 or less. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained to have severe neurodevelopmental delay, whereas 53 show higher frequencies in individuals ascertained to have ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In cells from the human cortex, expression of risk genes is enriched in excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.


Subject(s)
Autistic Disorder/genetics , Cerebral Cortex/growth & development , Exome Sequencing/methods , Gene Expression Regulation, Developmental , Neurobiology/methods , Case-Control Studies , Cell Lineage , Cohort Studies , Exome , Female , Gene Frequency , Genetic Predisposition to Disease , Humans , Male , Mutation, Missense , Neurons/metabolism , Phenotype , Sex Factors , Single-Cell Analysis/methods
2.
Nat Immunol ; 23(7): 1063-1075, 2022 07.
Article in English | MEDLINE | ID: mdl-35668320

ABSTRACT

Extracellular acidification occurs in inflamed tissue and the tumor microenvironment; however, a systematic study on how pH sensing contributes to tissue homeostasis is lacking. In the present study, we examine cell type-specific roles of the pH sensor G protein-coupled receptor 65 (GPR65) and its inflammatory disease-associated Ile231Leu-coding variant in inflammation control. GPR65 Ile231Leu knock-in mice are highly susceptible to both bacterial infection-induced and T cell-driven colitis. Mechanistically, GPR65 Ile231Leu elicits a cytokine imbalance through impaired helper type 17 T cell (TH17 cell) and TH22 cell differentiation and interleukin (IL)-22 production in association with altered cellular metabolism controlled through the cAMP-CREB-DGAT1 axis. In dendritic cells, GPR65 Ile231Leu elevates IL-12 and IL-23 release at acidic pH and alters endo-lysosomal fusion and degradation capacity, resulting in enhanced antigen presentation. The present study highlights GPR65 Ile231Leu as a multistep risk factor in intestinal inflammation and illuminates a mechanism by which pH sensing controls inflammatory circuits and tissue homeostasis.


Subject(s)
Colitis , Receptors, G-Protein-Coupled , Animals , Colitis/metabolism , Hydrogen-Ion Concentration , Inflammation/metabolism , Lysosomes/metabolism , Mice , Receptors, G-Protein-Coupled/genetics , Receptors, G-Protein-Coupled/metabolism , Th17 Cells/metabolism
3.
Cell ; 178(3): 714-730.e22, 2019 07 25.
Article in English | MEDLINE | ID: mdl-31348891

ABSTRACT

Genome-wide association studies (GWAS) have revealed risk alleles for ulcerative colitis (UC). To understand their cell type specificities and pathways of action, we generate an atlas of 366,650 cells from the colon mucosa of 18 UC patients and 12 healthy individuals, revealing 51 epithelial, stromal, and immune cell subsets, including BEST4+ enterocytes, microfold-like cells, and IL13RA2+IL11+ inflammatory fibroblasts, which we associate with resistance to anti-TNF treatment. Inflammatory fibroblasts, inflammatory monocytes, microfold-like cells, and T cells that co-express CD8 and IL-17 expand with disease, forming intercellular interaction hubs. Many UC risk genes are cell type specific and co-regulated within relatively few gene modules, suggesting convergence onto limited sets of cell types and pathways. Using this observation, we nominate and infer functions for specific risk genes across GWAS loci. Our work provides a framework for interrogating complex human diseases and mapping risk variants to cell types and pathways.


Subject(s)
Colitis, Ulcerative/pathology , Colon/metabolism , Adult , Aged , Antibodies, Monoclonal/therapeutic use , Bestrophins/metabolism , CD8 Antigens/metabolism , Case-Control Studies , Colitis, Ulcerative/drug therapy , Colitis, Ulcerative/metabolism , Colon/pathology , Enterocytes/cytology , Enterocytes/metabolism , Female , Genetic Loci , Genome-Wide Association Study , Humans , Interleukin-17/metabolism , Male , Middle Aged , Risk Factors , T-Lymphocytes/cytology , T-Lymphocytes/metabolism , Thrombospondins/metabolism , Tumor Necrosis Factor-alpha/immunology , Tumor Necrosis Factor-alpha/metabolism , Young Adult
4.
Cell ; 171(6): 1340-1353.e14, 2017 Nov 30.
Article in English | MEDLINE | ID: mdl-29195075

ABSTRACT

Approximately 15 genes have been directly associated with skin pigmentation variation in humans, leading to its characterization as a relatively simple trait. However, by assembling a global survey of quantitative skin pigmentation phenotypes, we demonstrate that pigmentation is more complex than previously assumed, with genetic architecture varying by latitude. We investigate polygenicity in the KhoeSan populations indigenous to southern Africa who have considerably lighter skin than equatorial Africans. We demonstrate that skin pigmentation is highly heritable, but known pigmentation loci explain only a small fraction of the variance. Rather, baseline skin pigmentation is a complex, polygenic trait in the KhoeSan. Despite this, we identify canonical and non-canonical skin pigmentation loci, including near SLC24A5, TYRP1, SMARCA2/VLDLR, and SNX13, using a genome-wide association approach complemented by targeted resequencing. By considering diverse, under-studied African populations, we show how the architecture of skin pigmentation can vary across humans subject to different local evolutionary pressures.


Subject(s)
Skin Pigmentation , Africa , Black People/genetics , Humans , Polymorphism, Single Nucleotide
5.
Nature ; 628(8008): 620-629, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38509369

ABSTRACT

Epstein-Barr virus (EBV) infection can engender severe B cell lymphoproliferative diseases1,2. The primary infection is often asymptomatic or causes infectious mononucleosis (IM), a self-limiting lymphoproliferative disorder3. Selective vulnerability to EBV has been reported in association with inherited mutations impairing T cell immunity to EBV4. Here we report biallelic loss-of-function variants in IL27RA that underlie an acute and severe primary EBV infection with a nevertheless favourable outcome requiring a minimal treatment. One mutant allele (rs201107107) was enriched in the Finnish population (minor allele frequency = 0.0068) and carried a high risk of severe infectious mononucleosis when homozygous. IL27RA encodes the IL-27 receptor alpha subunit5,6. In the absence of IL-27RA, phosphorylation of STAT1 and STAT3 by IL-27 is abolished in T cells. In in vitro studies, IL-27 exerts a synergistic effect on T-cell-receptor-dependent T cell proliferation7 that is deficient in cells from the patients, leading to impaired expansion of potent anti-EBV effector cytotoxic CD8+ T cells. IL-27 is produced by EBV-infected B lymphocytes and an IL-27RA-IL-27 autocrine loop is required for the maintenance of EBV-transformed B cells. This potentially explains the eventual favourable outcome of the EBV-induced viral disease in patients with IL-27RA deficiency. Furthermore, we identified neutralizing anti-IL-27 autoantibodies in most individuals who developed sporadic infectious mononucleosis and chronic EBV infection. These results demonstrate the critical role of IL-27RA-IL-27 in immunity to EBV, but also the hijacking of this defence by EBV to promote the expansion of infected transformed B cells.


Subject(s)
Epstein-Barr Virus Infections , Interleukin-27 , Receptors, Interleukin , Adolescent , Adult , Child , Child, Preschool , Female , Humans , Infant , Male , Young Adult , Alleles , B-Lymphocytes/pathology , B-Lymphocytes/virology , CD8-Positive T-Lymphocytes/pathology , Epstein-Barr Virus Infections/complications , Epstein-Barr Virus Infections/genetics , Epstein-Barr Virus Infections/therapy , Finland , Gene Frequency , Herpesvirus 4, Human , Homozygote , Infectious Mononucleosis/complications , Infectious Mononucleosis/genetics , Infectious Mononucleosis/therapy , Interleukin-27/immunology , Interleukin-27/metabolism , Loss of Function Mutation , Receptors, Interleukin/deficiency , Receptors, Interleukin/genetics , Receptors, Interleukin/metabolism , Treatment Outcome
6.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
7.
Nature ; 631(8019): 134-141, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38867047

ABSTRACT

Mosaic loss of the X chromosome (mLOX) is the most common clonal somatic alteration in leukocytes of female individuals1,2, but little is known about its genetic determinants or phenotypic consequences. Here, to address this, we used data from 883,574 female participants across 8 biobanks; 12% of participants exhibited detectable mLOX in approximately 2% of leukocytes. Female participants with mLOX had an increased risk of myeloid and lymphoid leukaemias. Genetic analyses identified 56 common variants associated with mLOX, implicating genes with roles in chromosomal missegregation, cancer predisposition and autoimmune diseases. Exome-sequence analyses identified rare missense variants in FBXO10 that confer a twofold increased risk of mLOX. Only a small fraction of associations was shared with mosaic Y chromosome loss, suggesting that distinct biological processes drive formation and clonal expansion of sex chromosome missegregation. Allelic shift analyses identified X chromosome alleles that are preferentially retained in mLOX, demonstrating variation at many loci under cellular selection. A polygenic score including 44 allelic shift loci correctly inferred the retained X chromosomes in 80.7% of mLOX cases in the top decile. Our results support a model in which germline variants predispose female individuals to acquiring mLOX, with the allelic content of the X chromosome possibly shaping the magnitude of clonal expansion.


Subject(s)
Aneuploidy , Chromosomes, Human, X , Clone Cells , Leukocytes , Mosaicism , Adult , Female , Humans , Male , Middle Aged , Alleles , Autoimmune Diseases/genetics , Biological Specimen Banks , Chromosome Segregation/genetics , Chromosomes, Human, X/genetics , Chromosomes, Human, Y/genetics , Clone Cells/metabolism , Clone Cells/pathology , Exome/genetics , F-Box Proteins/genetics , Genetic Predisposition to Disease/genetics , Germ-Line Mutation , Leukemia/genetics , Leukocytes/metabolism , Models, Genetic , Multifactorial Inheritance/genetics , Mutation, Missense/genetics
8.
Nat Rev Genet ; 23(9): 533-546, 2022 09.
Article in English | MEDLINE | ID: mdl-35501396

ABSTRACT

Human genetics can inform the biology and epidemiology of coronavirus disease 2019 (COVID-19) by pinpointing causal mechanisms that explain why some individuals become more severely affected by the disease upon infection by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. Large-scale genetic association studies, encompassing both rare and common genetic variants, have used different study designs and multiple disease phenotype definitions to identify several genomic regions associated with COVID-19. Along with a multitude of follow-up studies, these findings have increased our understanding of disease aetiology and provided routes for management of COVID-19. Important emergent opportunities include the clinical translatability of genetic risk prediction, the repurposing of existing drugs, exploration of variable host effects of different viral strains, study of inter-individual variability in vaccination response and understanding the long-term consequences of SARS-CoV-2 infection. Beyond the current pandemic, these transferrable opportunities are likely to affect the study of many infectious diseases.


Subject(s)
COVID-19 , COVID-19/epidemiology , COVID-19/genetics , Humans , Molecular Epidemiology , Pandemics , SARS-CoV-2/genetics
9.
Nature ; 603(7899): 95-102, 2022 03.
Article in English | MEDLINE | ID: mdl-35197637

ABSTRACT

Genome-wide association studies (GWAS) have identified thousands of genetic variants linked to the risk of human disease. However, GWAS have so far remained largely underpowered in relation to identifying associations in the rare and low-frequency allelic spectrum and have lacked the resolution to trace causal mechanisms to underlying genes1. Here we combined whole-exome sequencing in 392,814 UK Biobank participants with imputed genotypes from 260,405 FinnGen participants (653,219 total individuals) to conduct association meta-analyses for 744 disease endpoints across the protein-coding allelic frequency spectrum, bridging the gap between common and rare variant studies. We identified 975 associations, with more than one-third being previously unreported. We demonstrate population-level relevance for mutations previously ascribed to causing single-gene disorders, map GWAS associations to likely causal genes, explain disease mechanisms, and systematically relate disease associations to levels of 117 biomarkers and clinical-stage drug targets. Combining sequencing and genotyping in two population biobanks enabled us to benefit from increased power to detect and explain disease associations, validate findings through replication and propose medical actionability for rare genetic variants. Our study provides a compendium of protein-coding variant associations for future insights into disease biology and drug discovery.


Subject(s)
Genome-Wide Association Study , Proteins , Gene Frequency/genetics , Genetic Predisposition to Disease/genetics , Genotype , Humans , Polymorphism, Single Nucleotide/genetics , Proteins/genetics , Exome Sequencing
10.
Am J Hum Genet ; 111(6): 1047-1060, 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38776927

ABSTRACT

Lichen planus (LP) is a T-cell-mediated inflammatory disease affecting squamous epithelia in many parts of the body, most often the skin and oral mucosa. Cutaneous LP is usually transient and oral LP (OLP) is most often chronic, so we performed a large-scale genetic and epidemiological study of LP to address whether the oral and non-oral subgroups have shared or distinct underlying pathologies and their overlap with autoimmune disease. Using lifelong records covering diagnoses, procedures, and clinic identity from 473,580 individuals in the FinnGen study, genome-wide association analyses were conducted on carefully constructed subcategories of OLP (n = 3,323) and non-oral LP (n = 4,356) and on the combined group. We identified 15 genome-wide significant associations in FinnGen and an additional 12 when meta-analyzed with UKBB (27 independent associations at 25 distinct genomic locations), most of which are shared between oral and non-oral LP. Many associations coincide with known autoimmune disease loci, consistent with the epidemiologic enrichment of LP with hypothyroidism and other autoimmune diseases. Notably, a third of the FinnGen associations demonstrate significant differences between OLP and non-OLP. We also observed a 13.6-fold risk for tongue cancer and an elevated risk for other oral cancers in OLP, in agreement with earlier reports that connect LP with higher cancer incidence. In addition to a large-scale dissection of LP genetics and comorbidities, our study demonstrates the use of comprehensive, multidimensional health registry data to address outstanding clinical questions and reveal underlying biological mechanisms in common but understudied diseases.


Subject(s)
Autoimmune Diseases , Genome-Wide Association Study , Lichen Planus, Oral , Mouth Neoplasms , Humans , Autoimmune Diseases/genetics , Lichen Planus, Oral/genetics , Lichen Planus, Oral/pathology , Mouth Neoplasms/genetics , Mouth Neoplasms/pathology , Female , Male , Genetic Heterogeneity , Middle Aged , Lichen Planus/genetics , Lichen Planus/pathology , Genetic Predisposition to Disease , Aged , Adult , Risk Factors , Polymorphism, Single Nucleotide
11.
Genome Res ; 34(5): 796-809, 2024 06 25.
Article in English | MEDLINE | ID: mdl-38749656

ABSTRACT

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.


Subject(s)
Databases, Genetic , Genome, Human , Humans , Human Genome Project , High-Throughput Nucleotide Sequencing/methods , Genetic Variation , Genomics/methods
12.
Cell ; 149(3): 525-37, 2012 Apr 27.
Article in English | MEDLINE | ID: mdl-22521361

ABSTRACT

Balanced chromosomal abnormalities (BCAs) represent a relatively untapped reservoir of single-gene disruptions in neurodevelopmental disorders (NDDs). We sequenced BCAs in patients with autism or related NDDs, revealing disruption of 33 loci in four general categories: (1) genes previously associated with abnormal neurodevelopment (e.g., AUTS2, FOXP1, and CDKL5), (2) single-gene contributors to microdeletion syndromes (MBD5, SATB2, EHMT1, and SNURF-SNRPN), (3) novel risk loci (e.g., CHD8, KIRREL3, and ZNF507), and (4) genes associated with later-onset psychiatric disorders (e.g., TCF4, ZNF804A, PDE10A, GRIN2B, and ANK3). We also discovered among neurodevelopmental cases a profoundly increased burden of copy-number variants from these 33 loci and a significant enrichment of polygenic risk alleles from genome-wide association studies of autism and schizophrenia. Our findings suggest a polygenic risk model of autism and reveal that some neurodevelopmental genes are sensitive to perturbation by multiple mutational mechanisms, leading to variable phenotypic outcomes that manifest at different life stages.


Subject(s)
Child Development Disorders, Pervasive/genetics , Chromosome Aberrations , Autistic Disorder/diagnosis , Autistic Disorder/genetics , Child , Child Development Disorders, Pervasive/diagnosis , Chromosome Breakage , Chromosome Deletion , DNA Copy Number Variations , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Nervous System/growth & development , Schizophrenia/genetics , Sequence Analysis, DNA , Signal Transduction
13.
Nature ; 593(7858): 238-243, 2021 05.
Article in English | MEDLINE | ID: mdl-33828297

ABSTRACT

Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.


Subject(s)
Enhancer Elements, Genetic/genetics , Genetic Predisposition to Disease , Genetic Variation/genetics , Genome, Human/genetics , Genome-Wide Association Study , Inflammatory Bowel Diseases/genetics , Cell Line , Chromosomes, Human, Pair 10/genetics , Cyclophilins/genetics , Dendritic Cells , Female , Humans , Macrophages/metabolism , Male , Mitochondria/metabolism , Organ Specificity/genetics , Phenotype
14.
Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-38000370

ABSTRACT

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.


Subject(s)
DNA , Trout , Humans , Animals , Sequence Analysis, DNA/methods , Genotype , Homozygote , High-Throughput Nucleotide Sequencing/methods , Software
15.
Genome Res ; 33(6): 999-1005, 2023 06.
Article in English | MEDLINE | ID: mdl-37253541

ABSTRACT

Large-scale high-throughput sequencing data sets have been transformative for informing clinical variant interpretation and for use as reference panels for statistical and population genetic efforts. Although such resources are often treated as ground truth, we find that in widely used reference data sets such as the Genome Aggregation Database (gnomAD), some variants pass gold-standard filters, yet are systematically different in their genotype calls across genotype discovery approaches. The inclusion of such discordant sites in study designs involving multiple genotype discovery strategies could bias results and lead to false-positive hits in association studies owing to technological artifacts rather than a true relationship to the phenotype. Here, we describe this phenomenon of discordant genotype calls across genotype discovery approaches, characterize the error mode of wrong calls, provide a list of discordant sites identified in gnomAD that should be treated with caution in analyses, and present a metric and machine learning classifier trained on gnomAD data to identify likely discordant variants in other data sets. We find that different genotype discovery approaches have different sets of variants at which this problem occurs, but there are characteristic variant features that can be used to predict discordant behavior. Discordant sites are largely shared across ancestry groups, although different populations are powered for the discovery of different variants. We find that the most common error mode is that of a variant being heterozygous for one approach and homozygous for the other, with heterozygous in the genomes and homozygous reference in the exomes making up the majority of miscalls.


Subject(s)
Exome , Genetics, Population , Genotype , Heterozygote , Phenotype , Polymorphism, Single Nucleotide
16.
Blood ; 143(23): 2425-2432, 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38498041

ABSTRACT

ABSTRACT: The factor V Leiden (FVL; rs6025) and prothrombin G20210A (PTGM; rs1799963) polymorphisms are 2 of the most well-studied genetic risk factors for venous thromboembolism (VTE). However, double heterozygosity (DH) for FVL and PTGM remains poorly understood, with previous studies showing marked disagreement regarding thrombosis risk conferred by the DH genotype. Using multidimensional data from the UK Biobank (UKB) and FinnGen biorepositories, we evaluated the clinical impact of DH carrier status across 937 939 individuals. We found that 662 participants (0.07%) were DH carriers. After adjustment for age, sex, and ancestry, DH individuals experienced a markedly elevated risk of VTE compared with wild-type individuals (odds ratio [OR] = 5.24; 95% confidence interval [CI], 4.01-6.84; P = 4.8 × 10-34), which approximated the risk conferred by FVL homozygosity. A secondary analysis restricted to UKB participants (N = 445 144) found that effect size estimates for the DH genotype remained largely unchanged (OR = 4.53; 95% CI, 3.42-5.90; P < 1 × 10-16) after adjustment for commonly cited VTE risk factors, such as body mass index, blood type, and markers of inflammation. In contrast, the DH genotype was not associated with a significantly higher risk of any arterial thrombosis phenotype, including stroke, myocardial infarction, and peripheral artery disease. In summary, we leveraged population-scale genomic data sets to conduct, to our knowledge, the largest study to date on the DH genotype and were able to establish far more precise effect size estimates than previously possible. Our findings indicate that the DH genotype may occur as frequently as FVL homozygosity and may confer a similarly increased risk of VTE.


Subject(s)
Biological Specimen Banks , Factor V , Heterozygote , Prothrombin , Humans , Prothrombin/genetics , Factor V/genetics , Female , Male , Middle Aged , United Kingdom/epidemiology , Aged , Risk Factors , Venous Thromboembolism/genetics , Venous Thromboembolism/epidemiology , Adult , Thrombosis/genetics , Thrombosis/epidemiology , Thrombosis/etiology , Genetic Predisposition to Disease , Genotype , Polymorphism, Single Nucleotide , UK Biobank
17.
Nature ; 581(7809): 459-464, 2020 05.
Article in English | MEDLINE | ID: mdl-32461653

ABSTRACT

Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.


Subject(s)
Genes, Essential/drug effects , Genes, Essential/genetics , Loss of Function Mutation/genetics , Molecular Targeted Therapy , Artifacts , Automation , Consanguinity , Exons/genetics , Gain of Function Mutation/genetics , Gene Frequency , Gene Knockdown Techniques , Heterozygote , Homozygote , Humans , Huntingtin Protein/genetics , Leucine-Rich Repeat Serine-Threonine Protein Kinase-2/genetics , Neurodegenerative Diseases/genetics , Prion Proteins/genetics , Reproducibility of Results , Sample Size , tau Proteins/genetics
18.
Nature ; 581(7809): 452-458, 2020 05.
Article in English | MEDLINE | ID: mdl-32461655

ABSTRACT

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.


Subject(s)
Disease/genetics , Haploinsufficiency/genetics , Loss of Function Mutation/genetics , Molecular Sequence Annotation , Transcription, Genetic , Transcriptome/genetics , Autism Spectrum Disorder/genetics , Datasets as Topic , Developmental Disabilities/genetics , Exons/genetics , Female , Genotype , Humans , Intellectual Disability/genetics , Male , Molecular Sequence Annotation/standards , Poisson Distribution , RNA, Messenger/analysis , RNA, Messenger/genetics , Rare Diseases/diagnosis , Rare Diseases/genetics , Reproducibility of Results , Exome Sequencing
19.
Nature ; 586(7831): 769-775, 2020 10.
Article in English | MEDLINE | ID: mdl-33057200

ABSTRACT

Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P < 5.0 × 10-8), 7 of which have not been previously reported. We find that there is a shared genetic architecture between MPN risk and several haematopoietic traits from distinct lineages; that there is an enrichment for MPN risk variants within accessible chromatin of HSCs; and that increased MPN risk is associated with longer telomere length in leukocytes and other clonal haematopoietic states-collectively suggesting that MPN risk is associated with the function and self-renewal of HSCs. We use gene mapping to identify modulators of HSC biology linked to MPN risk, and show through targeted variant-to-function assays that CHEK2 and GFI1B have roles in altering the function of HSCs to confer disease risk. Overall, our results reveal a previously unappreciated mechanism for inherited MPN risk through the modulation of HSC function.


Subject(s)
Genetic Predisposition to Disease/genetics , Hematopoietic Stem Cells/pathology , Myeloproliferative Disorders/genetics , Myeloproliferative Disorders/pathology , Neoplasms/genetics , Neoplasms/pathology , Cell Lineage/genetics , Cell Self Renewal , Checkpoint Kinase 2/genetics , Female , Humans , Leukocytes/pathology , Male , Proto-Oncogene Proteins/genetics , Repressor Proteins/genetics , Risk , Telomere Homeostasis
20.
Nature ; 581(7809): 444-451, 2020 05.
Article in English | MEDLINE | ID: mdl-32461652

ABSTRACT

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Subject(s)
Disease/genetics , Genetic Variation , Genetics, Medical/standards , Genetics, Population/standards , Genome, Human/genetics , Female , Genetic Testing , Genotyping Techniques , Humans , Male , Middle Aged , Mutation , Polymorphism, Single Nucleotide/genetics , Racial Groups/genetics , Reference Standards , Selection, Genetic , Whole Genome Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL