Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 264
Filter
Add more filters

Publication year range
1.
Cell ; 155(1): 242-56, 2013 Sep 26.
Article in English | MEDLINE | ID: mdl-24074872

ABSTRACT

The complex network of specialized cells and molecules in the immune system has evolved to defend against pathogens, but inadvertent immune system attacks on "self" result in autoimmune disease. Both genetic regulation of immune cell levels and their relationships with autoimmunity are largely undetermined. Here, we report genetic contributions to quantitative levels of 95 cell types encompassing 272 immune traits, in a cohort of 1,629 individuals from four clustered Sardinian villages. We first estimated trait heritability, showing that it can be substantial, accounting for up to 87% of the variance (mean 41%). Next, by assessing ∼8.2 million variants that we identified and confirmed in an extended set of 2,870 individuals, 23 independent variants at 13 loci associated with at least one trait. Notably, variants at three loci (HLA, IL2RA, and SH2B3/ATXN2) overlap with known autoimmune disease associations. These results connect specific cellular phenotypes to specific genetic variants, helping to explicate their involvement in disease.


Subject(s)
Flow Cytometry/methods , Genetic Predisposition to Disease , Genome-Wide Association Study , Immune System Diseases/genetics , Polymorphism, Single Nucleotide , Humans , Phenotype
2.
Nature ; 612(7939): 301-309, 2022 12.
Article in English | MEDLINE | ID: mdl-36450978

ABSTRACT

Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes1-5. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes6, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.


Subject(s)
COVID-19 , Cardiovascular Diseases , Humans , Clonal Hematopoiesis/genetics , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/genetics
3.
Nature ; 607(7917): 97-103, 2022 07.
Article in English | MEDLINE | ID: mdl-35255492

ABSTRACT

Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2-4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes-including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)-in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease.


Subject(s)
COVID-19 , Critical Illness , Genome, Human , Host-Pathogen Interactions , Whole Genome Sequencing , ATP-Binding Cassette Transporters , COVID-19/genetics , COVID-19/mortality , COVID-19/pathology , COVID-19/virology , Cell Adhesion Molecules , Critical Care , Critical Illness/mortality , E-Selectin , Factor VIII , Fucosyltransferases , Genome, Human/genetics , Genome-Wide Association Study , Host-Pathogen Interactions/genetics , Humans , Interleukin-10 Receptor beta Subunit , Lectins, C-Type , Mucin-1 , Nerve Tissue Proteins , Phospholipid Transfer Proteins , Receptors, Cell Surface , Repressor Proteins , SARS-CoV-2/pathogenicity , Galactoside 2-alpha-L-fucosyltransferase
4.
Nature ; 599(7886): 628-634, 2021 11.
Article in English | MEDLINE | ID: mdl-34662886

ABSTRACT

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.


Subject(s)
Biological Specimen Banks , Databases, Genetic , Exome Sequencing , Exome/genetics , Africa/ethnology , Asia/ethnology , Asthma/genetics , Diabetes Mellitus/genetics , Europe/ethnology , Eye Diseases/genetics , Female , Genetic Predisposition to Disease/genetics , Genetic Variation , Genome-Wide Association Study , Humans , Hypertension/genetics , Liver Diseases/genetics , Male , Mutation , Neoplasms/genetics , Quantitative Trait, Heritable , United Kingdom
5.
Hum Mol Genet ; 33(4): 374-385, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-37934784

ABSTRACT

Genome-wide association studies have contributed extensively to the discovery of disease-associated common variants. However, the genetic contribution to complex traits is still largely difficult to interpret. We report a genome-wide association study of 2394 cases and 2393 controls for age-related macular degeneration (AMD) via whole-genome sequencing, with 46.9 million genetic variants. Our study reveals significant single-variant association signals at four loci and independent gene-based signals in CFH, C2, C3, and NRTN. Using data from the Exome Aggregation Consortium (ExAC) for a gene-based test, we demonstrate an enrichment of predicted rare loss-of-function variants in CFH, CFI, and an as-yet unreported gene in AMD, ORMDL2. Our method of using a large variant list without individual-level genotypes as an external reference provides a flexible and convenient approach to leverage the publicly available variant datasets to augment the search for rare variant associations, which can explain additional disease risk in AMD.


Subject(s)
Genome-Wide Association Study , Macular Degeneration , Humans , Genome-Wide Association Study/methods , Macular Degeneration/genetics , Genotype , Genetic Testing , Whole Genome Sequencing , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease , Complement Factor H/genetics
6.
Nature ; 586(7831): 749-756, 2020 10.
Article in English | MEDLINE | ID: mdl-33087929

ABSTRACT

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Subject(s)
Databases, Genetic , Exome Sequencing , Exome/genetics , Loss of Function Mutation/genetics , Phenotype , Aged , Bone Density/genetics , Collagen Type VI/genetics , Demography , Female , Genes, BRCA1 , Genes, BRCA2 , Genotype , Humans , Ion Channels/genetics , Male , Middle Aged , Neoplasms/genetics , Penetrance , Peptide Fragments/genetics , United Kingdom , Varicose Veins/genetics , ras GTPase-Activating Proteins/genetics
7.
Genet Epidemiol ; 2024 Oct 09.
Article in English | MEDLINE | ID: mdl-39385445

ABSTRACT

Persistent opioid use after surgery is a common morbidity outcome associated with subsequent opioid use disorder, overdose, and death. While phenotypic associations have been described, genetic associations remain unidentified. Here, we conducted the largest genetic study of persistent opioid use after surgery, comprising ~40,000 non-Hispanic, European-ancestry Michigan Genomics Initiative participants (3198 cases and 36,321 surgically exposed controls). Our study primarily focused on the reproducibility and reliability of 72 genetic studies of opioid use disorder phenotypes. Nominal associations (p < 0.05) occurred at 12 of 80 unique (r2 < 0.8) signals from these studies. Six occurred in OPRM1 (most significant: rs79704991-T, OR = 1.17, p = 8.7 × 10-5), with two surviving multiple testing correction. Other associations were rs640561-LRRIQ3 (p = 0.015), rs4680-COMT (p = 0.016), rs9478495 (p = 0.017, intergenic), rs10886472-GRK5 (p = 0.028), rs9291211-SLC30A9/BEND4 (p = 0.043), and rs112068658-KCNN1 (p = 0.048). Two highly referenced genes, OPRD1 and DRD2/ANKK1, had no signals in MGI. Associations at previously identified OPRM1 variants suggest common biology between persistent opioid use and opioid use disorder, further demonstrating connections between opioid dependence and addiction phenotypes. Lack of significant associations at other variants challenges previous studies' reliability.

8.
N Engl J Med ; 387(4): 332-344, 2022 07 28.
Article in English | MEDLINE | ID: mdl-35939579

ABSTRACT

BACKGROUND: Exome sequencing in hundreds of thousands of persons may enable the identification of rare protein-coding genetic variants associated with protection from human diseases like liver cirrhosis, providing a strategy for the discovery of new therapeutic targets. METHODS: We performed a multistage exome sequencing and genetic association analysis to identify genes in which rare protein-coding variants were associated with liver phenotypes. We conducted in vitro experiments to further characterize associations. RESULTS: The multistage analysis involved 542,904 persons with available data on liver aminotransferase levels, 24,944 patients with various types of liver disease, and 490,636 controls without liver disease. We found that rare coding variants in APOB, ABCB4, SLC30A10, and TM6SF2 were associated with increased aminotransferase levels and an increased risk of liver disease. We also found that variants in CIDEB, which encodes a structural protein found in hepatic lipid droplets, had a protective effect. The burden of rare predicted loss-of-function variants plus missense variants in CIDEB (combined carrier frequency, 0.7%) was associated with decreased alanine aminotransferase levels (beta per allele, -1.24 U per liter; 95% confidence interval [CI], -1.66 to -0.83; P = 4.8×10-9) and with 33% lower odds of liver disease of any cause (odds ratio per allele, 0.67; 95% CI, 0.57 to 0.79; P = 9.9×10-7). Rare coding variants in CIDEB were associated with a decreased risk of liver disease across different underlying causes and different degrees of severity, including cirrhosis of any cause (odds ratio per allele, 0.50; 95% CI, 0.36 to 0.70). Among 3599 patients who had undergone bariatric surgery, rare coding variants in CIDEB were associated with a decreased nonalcoholic fatty liver disease activity score (beta per allele in score units, -0.98; 95% CI, -1.54 to -0.41 [scores range from 0 to 8, with higher scores indicating more severe disease]). In human hepatoma cell lines challenged with oleate, CIDEB small interfering RNA knockdown prevented the buildup of large lipid droplets. CONCLUSIONS: Rare germline mutations in CIDEB conferred substantial protection from liver disease. (Funded by Regeneron Pharmaceuticals.).


Subject(s)
Apoptosis Regulatory Proteins , Germ-Line Mutation , Liver Diseases , Apoptosis Regulatory Proteins/genetics , Apoptosis Regulatory Proteins/metabolism , Genetic Predisposition to Disease/genetics , Genetic Predisposition to Disease/prevention & control , Humans , Liver/metabolism , Liver Diseases/genetics , Liver Diseases/metabolism , Liver Diseases/prevention & control , Transaminases/genetics , Exome Sequencing
9.
Genet Epidemiol ; 47(3): 231-248, 2023 04.
Article in English | MEDLINE | ID: mdl-36739617

ABSTRACT

Linkage analysis, a class of methods for detecting co-segregation of genomic segments and traits in families, was used to map disease-causing genes for decades before genotyping arrays and dense SNP genotyping enabled genome-wide association studies in population samples. Population samples often contain related individuals, but the segregation of alleles within families is rarely used because traditional linkage methods are computationally inefficient for larger datasets. Here, we describe Population Linkage, a novel application of Haseman-Elston regression as a method of moments estimator of variance components and their standard errors. We achieve additional computational efficiency by using modern methods for detection of IBD segments and variance component estimation, efficient preprocessing of input data, and minimizing redundant numerical calculations. We also refined variance component models to account for the biases in population-scale methods for IBD segment detection. We ran Population Linkage on four blood lipid traits in over 70,000 individuals from the HUNT and SardiNIA studies, successfully detecting 25 known genetic signals. One notable linkage signal that appeared in both was for low-density lipoprotein (LDL) cholesterol levels in the region near the gene APOE (LOD = 29.3, variance explained = 4.1%). This is the region where the missense variants rs7412 and rs429358, which together make up the ε2, ε3, and ε4 alleles each account for 2.4% and 0.8% of variation in circulating LDL cholesterol. Our results show the potential for linkage analysis and other large-scale applications of method of moments variance components estimation.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Humans , Phenotype , Cholesterol, LDL/genetics , Genetic Linkage , Apolipoproteins E/genetics
10.
Hum Mol Genet ; 31(3): 347-361, 2022 02 03.
Article in English | MEDLINE | ID: mdl-34553764

ABSTRACT

Platelets play a key role in thrombosis and hemostasis. Platelet count (PLT) and mean platelet volume (MPV) are highly heritable quantitative traits, with hundreds of genetic signals previously identified, mostly in European ancestry populations. We here utilize whole genome sequencing (WGS) from NHLBI's Trans-Omics for Precision Medicine initiative (TOPMed) in a large multi-ethnic sample to further explore common and rare variation contributing to PLT (n = 61 200) and MPV (n = 23 485). We identified and replicated secondary signals at MPL (rs532784633) and PECAM1 (rs73345162), both more common in African ancestry populations. We also observed rare variation in Mendelian platelet-related disorder genes influencing variation in platelet traits in TOPMed cohorts (not enriched for blood disorders). For example, association of GP9 with lower PLT and higher MPV was partly driven by a pathogenic Bernard-Soulier syndrome variant (rs5030764, p.Asn61Ser), and the signals at TUBB1 and CD36 were partly driven by loss of function variants not annotated as pathogenic in ClinVar (rs199948010 and rs571975065). However, residual signal remained for these gene-based signals after adjusting for lead variants, suggesting that additional variants in Mendelian genes with impacts in general population cohorts remain to be identified. Gene-based signals were also identified at several genome-wide association study identified loci for genes not annotated for Mendelian platelet disorders (PTPRH, TET2, CHEK2), with somatic variation driving the result at TET2. These results highlight the value of WGS in populations of diverse genetic ancestry to identify novel regulatory and coding signals, even for well-studied traits like platelet traits.


Subject(s)
Genome-Wide Association Study , Precision Medicine , Blood Platelets , Humans , National Heart, Lung, and Blood Institute (U.S.) , Phenotype , Polymorphism, Single Nucleotide , Precision Medicine/methods , United States
11.
Am J Hum Genet ; 108(5): 874-893, 2021 05 06.
Article in English | MEDLINE | ID: mdl-33887194

ABSTRACT

Whole-genome sequencing (WGS), a powerful tool for detecting novel coding and non-coding disease-causing variants, has largely been applied to clinical diagnosis of inherited disorders. Here we leveraged WGS data in up to 62,653 ethnically diverse participants from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and assessed statistical association of variants with seven red blood cell (RBC) quantitative traits. We discovered 14 single variant-RBC trait associations at 12 genomic loci, which have not been reported previously. Several of the RBC trait-variant associations (RPN1, ELL2, MIDN, HBB, HBA1, PIEZO1, and G6PD) were replicated in independent GWAS datasets imputed to the TOPMed reference panel. Most of these discovered variants are rare/low frequency, and several are observed disproportionately among non-European Ancestry (African, Hispanic/Latino, or East Asian) populations. We identified a 3 bp indel p.Lys2169del (g.88717175_88717177TCT[4]) (common only in the Ashkenazi Jewish population) of PIEZO1, a gene responsible for the Mendelian red cell disorder hereditary xerocytosis (MIM: 194380), associated with higher mean corpuscular hemoglobin concentration (MCHC). In stepwise conditional analysis and in gene-based rare variant aggregated association analysis, we identified several of the variants in HBB, HBA1, TMPRSS6, and G6PD that represent the carrier state for known coding, promoter, or splice site loss-of-function variants that cause inherited RBC disorders. Finally, we applied base and nuclease editing to demonstrate that the sentinel variant rs112097551 (nearest gene RPN1) acts through a cis-regulatory element that exerts long-range control of the gene RUVBL1 which is essential for hematopoiesis. Together, these results demonstrate the utility of WGS in ethnically diverse population-based samples and gene editing for expanding knowledge of the genetic architecture of quantitative hematologic traits and suggest a continuum between complex trait and Mendelian red cell disorders.


Subject(s)
Erythrocytes/metabolism , Erythrocytes/pathology , Genome-Wide Association Study , National Heart, Lung, and Blood Institute (U.S.)/organization & administration , Phenotype , Adult , Aged , Chromosomes, Human, Pair 16/genetics , Datasets as Topic , Female , Gene Editing , Genetic Variation/genetics , HEK293 Cells , Humans , Male , Middle Aged , Quality Control , Reproducibility of Results , United States
12.
Am J Hum Genet ; 108(10): 1836-1851, 2021 10 07.
Article in English | MEDLINE | ID: mdl-34582791

ABSTRACT

Many common and rare variants associated with hematologic traits have been discovered through imputation on large-scale reference panels. However, the majority of genome-wide association studies (GWASs) have been conducted in Europeans, and determining causal variants has proved challenging. We performed a GWAS of total leukocyte, neutrophil, lymphocyte, monocyte, eosinophil, and basophil counts generated from 109,563,748 variants in the autosomes and the X chromosome in the Trans-Omics for Precision Medicine (TOPMed) program, which included data from 61,802 individuals of diverse ancestry. We discovered and replicated 7 leukocyte trait associations, including (1) the association between a chromosome X, pseudo-autosomal region (PAR), noncoding variant located between cytokine receptor genes (CSF2RA and CLRF2) and lower eosinophil count; and (2) associations between single variants found predominantly among African Americans at the S1PR3 (9q22.1) and HBB (11p15.4) loci and monocyte and lymphocyte counts, respectively. We further provide evidence indicating that the newly discovered eosinophil-lowering chromosome X PAR variant might be associated with reduced susceptibility to common allergic diseases such as atopic dermatitis and asthma. Additionally, we found a burden of very rare FLT3 (13q12.2) variants associated with monocyte counts. Together, these results emphasize the utility of whole-genome sequencing in diverse samples in identifying associations missed by European-ancestry-driven GWASs.


Subject(s)
Asthma/epidemiology , Biomarkers/metabolism , Dermatitis, Atopic/epidemiology , Leukocytes/pathology , Polymorphism, Single Nucleotide , Pulmonary Disease, Chronic Obstructive/epidemiology , Quantitative Trait Loci , Asthma/genetics , Asthma/metabolism , Asthma/pathology , Dermatitis, Atopic/genetics , Dermatitis, Atopic/metabolism , Dermatitis, Atopic/pathology , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study , Humans , National Heart, Lung, and Blood Institute (U.S.) , Phenotype , Prognosis , Proteome/analysis , Proteome/metabolism , Pulmonary Disease, Chronic Obstructive/genetics , Pulmonary Disease, Chronic Obstructive/metabolism , Pulmonary Disease, Chronic Obstructive/pathology , United Kingdom/epidemiology , United States/epidemiology , Whole Genome Sequencing
14.
Genome Res ; 30(2): 185-194, 2020 02.
Article in English | MEDLINE | ID: mdl-31980570

ABSTRACT

Detecting and estimating DNA sample contamination are important steps to ensure high-quality genotype calls and reliable downstream analysis. Existing methods rely on population allele frequency information for accurate estimation of contamination rates. Correctly specifying population allele frequencies for each individual in early stage of sequence analysis is impractical or even impossible for large-scale sequencing centers that simultaneously process samples from multiple studies across diverse populations. On the other hand, incorrectly specified allele frequencies may result in substantial bias in estimated contamination rates. For example, we observed that existing methods often fail to identify 10% contaminated samples at a typical 3% contamination exclusion threshold when genetic ancestry is misspecified. Such an incomplete screening of contaminated samples substantially inflates the estimated rate of genotyping errors even in deeply sequenced genomes and exomes. We propose a robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample. Our method integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates. Our method can also be used for estimating genetic ancestries, similar to LASER or TRACE, but simultaneously accounting for potential contamination. We demonstrate that our method robustly estimates contamination rates and genetic ancestries across populations and contamination scenarios. We further demonstrate that, in the presence of contamination, genetic ancestry inference can be substantially biased with existing methods that ignore contamination, while our method corrects for such biases.


Subject(s)
DNA Contamination , DNA/genetics , Genotype , Genotyping Techniques/standards , Alleles , Exome/genetics , Gene Frequency/genetics , Genetics, Population , Humans , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA
15.
PLoS Genet ; 16(11): e1009077, 2020 11.
Article in English | MEDLINE | ID: mdl-33175840

ABSTRACT

Phenotypes extracted from Electronic Health Records (EHRs) are increasingly prevalent in genetic studies. EHRs contain hundreds of distinct clinical laboratory test results, providing a trove of health data beyond diagnoses. Such lab data is complex and lacks a ubiquitous coding scheme, making it more challenging than diagnosis data. Here we describe the first large-scale cross-health system genome-wide association study (GWAS) of EHR-based quantitative laboratory-derived phenotypes. We meta-analyzed 70 lab traits matched between the BioVU cohort from the Vanderbilt University Health System and the Michigan Genomics Initiative (MGI) cohort from Michigan Medicine. We show high replication of known association for these traits, validating EHR-based measurements as high-quality phenotypes for genetic analysis. Notably, our analysis provides the first replication for 699 previous GWAS associations across 46 different traits. We discovered 31 novel associations at genome-wide significance for 22 distinct traits, including the first reported associations for two lab-based traits. We replicated 22 of these novel associations in an independent tranche of BioVU samples. The summary statistics for all association tests are freely available to benefit other researchers. Finally, we performed mirrored analyses in BioVU and MGI to assess competing analytic practices for EHR lab traits. We find that using the mean of all available lab measurements provides a robust summary value, but alternate summarizations can improve power in certain circumstances. This study provides a proof-of-principle for cross health system GWAS and is a framework for future studies of quantitative EHR lab traits.


Subject(s)
Electronic Health Records/statistics & numerical data , Genetic Association Studies/methods , Genome-Wide Association Study/methods , Biological Specimen Banks , Cohort Studies , Electronic Health Records/trends , Genomics , Humans , Michigan , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait, Heritable
16.
PLoS Genet ; 16(6): e1008725, 2020 06.
Article in English | MEDLINE | ID: mdl-32603359

ABSTRACT

Risk factors that contribute to inter-individual differences in the age-of-onset of allergic diseases are poorly understood. The aim of this study was to identify genetic risk variants associated with the age at which symptoms of allergic disease first develop, considering information from asthma, hay fever and eczema. Self-reported age-of-onset information was available for 117,130 genotyped individuals of European ancestry from the UK Biobank study. For each individual, we identified the earliest age at which asthma, hay fever and/or eczema was first diagnosed and performed a genome-wide association study (GWAS) of this combined age-of-onset phenotype. We identified 50 variants with a significant independent association (P<3x10-8) with age-of-onset. Forty-five variants had comparable effects on the onset of the three individual diseases and 38 were also associated with allergic disease case-control status in an independent study (n = 222,484). We observed a strong negative genetic correlation between age-of-onset and case-control status of allergic disease (rg = -0.63, P = 4.5x10-61), indicating that cases with early disease onset have a greater burden of allergy risk alleles than those with late disease onset. Subsequently, a multivariate GWAS of age-of-onset and case-control status identified a further 26 associations that were missed by the univariate analyses of age-of-onset or case-control status only. Collectively, of the 76 variants identified, 18 represent novel associations for allergic disease. We identified 81 likely target genes of the 76 associated variants based on information from expression quantitative trait loci (eQTL) and non-synonymous variants, of which we highlight ADAM15, FOSL2, TRIM8, BMPR2, CD200R1, PRKCQ, NOD2, SMAD4, ABCA7 and UBE2L3. Our results support the notion that early and late onset allergic disease have partly distinct genetic architectures, potentially explaining known differences in pathophysiology between individuals.


Subject(s)
Asthma/genetics , Eczema/genetics , Polymorphism, Single Nucleotide , Rhinitis, Allergic, Seasonal/genetics , Adolescent , Adult , Age of Onset , Aged , Asthma/pathology , Child , Eczema/pathology , Female , Genetic Loci , Genome-Wide Association Study/methods , Humans , Male , Middle Aged , Rhinitis, Allergic, Seasonal/pathology
17.
Hum Mol Genet ; 29(12): 2022-2034, 2020 07 29.
Article in English | MEDLINE | ID: mdl-32246154

ABSTRACT

Genome-wide association studies (GWAS) have identified 52 independent variants at 34 genetic loci that are associated with age-related macular degeneration (AMD), the most common cause of incurable vision loss in the elderly worldwide. However, causal genes at the majority of these loci remain unknown. In this study, we performed whole exome sequencing of 264 individuals from 63 multiplex families with AMD and analyzed the data for rare protein-altering variants in candidate target genes at AMD-associated loci. Rare coding variants were identified in the CFH, PUS7, RXFP2, PHF12 and TACC2 genes in three or more families. In addition, we detected rare coding variants in the C9, SPEF2 and BCAR1 genes, which were previously suggested as likely causative genes at respective AMD susceptibility loci. Identification of rare variants in the CFH and C9 genes in our study validated previous reports of rare variants in complement pathway genes in AMD. We then extended our exome-wide analysis and identified rare protein-altering variants in 13 genes outside the AMD-GWAS loci in three or more families. Two of these genes, SCN10A and KIR2DL4, are of interest because variants in these genes also showed association with AMD in case-control cohorts, albeit not at the level of genome-wide significance. Our study presents the first large-scale, exome-wide analysis of rare variants in AMD. Further independent replications and molecular investigation of candidate target genes, reported here, would assist in gaining novel insights into mechanisms underlying AMD pathogenesis.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Macular Degeneration/genetics , NAV1.8 Voltage-Gated Sodium Channel/genetics , Receptors, KIR2DL4/genetics , Aged , Aged, 80 and over , Exome/genetics , Humans , Macular Degeneration/pathology , Male , Middle Aged , Polymorphism, Single Nucleotide/genetics , Exome Sequencing
18.
Am J Hum Genet ; 105(1): 65-77, 2019 07 03.
Article in English | MEDLINE | ID: mdl-31204010

ABSTRACT

The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education. Health history and daily tracking surveys are administered through a Facebook application, and participants who complete a minimum number of surveys are mailed a saliva sample kit ("spit kit") to collect DNA for genotyping. As of March 2019, we engaged >80,000 individuals, sent spit kits to >32,000 individuals who met minimum participation requirements, and collected >27,000 spit kits. Participants come from all 50 states and include a diversity of ancestral backgrounds. Rates of important chronic health indicators are consistent with those estimated for the general U.S. population using more traditional study designs. However, our sample is younger and contains a greater percentage of females than the general population. As one means of verifying data quality, we have replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation. The flexible framework of the web application makes it relatively simple to add new questionnaires and for other researchers to collaborate. We anticipate that the study sample will continue to grow and that future analyses may further capitalize on the strengths of the longitudinal data in combination with genetic information.


Subject(s)
Genes/genetics , Genetic Markers , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Research Design , Social Media , Adolescent , Adult , Diabetes Mellitus/diagnosis , Diabetes Mellitus/genetics , Female , Humans , Hypertension/diagnosis , Hypertension/genetics , Male , Middle Aged , Public Health , Surveys and Questionnaires , Young Adult
19.
Gastroenterology ; 160(4): 1164-1178.e6, 2021 03.
Article in English | MEDLINE | ID: mdl-33058866

ABSTRACT

BACKGROUND AND AIMS: Susceptibility genes and the underlying mechanisms for the majority of risk loci identified by genome-wide association studies (GWAS) for colorectal cancer (CRC) risk remain largely unknown. We conducted a transcriptome-wide association study (TWAS) to identify putative susceptibility genes. METHODS: Gene-expression prediction models were built using transcriptome and genetic data from the 284 normal transverse colon tissues of European descendants from the Genotype-Tissue Expression (GTEx), and model performance was evaluated using data from The Cancer Genome Atlas (n = 355). We applied the gene-expression prediction models and GWAS data to evaluate associations of genetically predicted gene-expression with CRC risk in 58,131 CRC cases and 67,347 controls of European ancestry. Dual-luciferase reporter assays and knockdown experiments in CRC cells and tumor xenografts were conducted. RESULTS: We identified 25 genes associated with CRC risk at a Bonferroni-corrected threshold of P < 9.1 × 10-6, including genes in 4 novel loci, PYGL (14q22.1), RPL28 (19q13.42), CAPN12 (19q13.2), MYH7B (20q11.22), and MAP1L3CA (20q11.22). In 9 known GWAS-identified loci, we uncovered 9 genes that have not been reported previously, whereas 4 genes remained statistically significant after adjusting for the lead risk variant of the locus. Through colocalization analysis in GWAS loci, we additionally identified 12 putative susceptibility genes that were supported by TWAS analysis at P < .01. We showed that risk allele of the lead risk variant rs1741640 affected the promoter activity of CABLES2. Knockdown experiments confirmed that CABLES2 plays a vital role in colorectal carcinogenesis. CONCLUSIONS: Our study reveals new putative susceptibility genes and provides new insight into the biological mechanisms underlying CRC development.


Subject(s)
Biomarkers, Tumor/genetics , Colorectal Neoplasms/genetics , Genetic Predisposition to Disease , Models, Genetic , Alleles , Carcinogenesis/genetics , Case-Control Studies , Cohort Studies , Colorectal Neoplasms/epidemiology , Gene Knockdown Techniques , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide , Promoter Regions, Genetic/genetics , RNA-Seq , Risk Factors , Xenograft Model Antitumor Assays
20.
Bioinformatics ; 37(18): 3017-3018, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33734315

ABSTRACT

SUMMARY: LocusZoom.js is a JavaScript library for creating interactive web-based visualizations of genetic association study results. It can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog). It can be embedded in web pages to enable data sharing and exploration. Views can be customized and extended to display other data types such as phenome-wide association study (PheWAS) results, chromatin co-accessibility, or eQTL measurements. A new web upload service harmonizes datasets, adds annotations, and makes it easy to explore user-provided result sets. AVAILABILITY AND IMPLEMENTATION: LocusZoom.js is open-source software under a permissive MIT license. Code and documentation are available at: https://github.com/statgen/locuszoom/. Installable packages for all versions are also distributed via NPM. Additional features are provided as standalone libraries to promote reuse. Use with your own GWAS results at https://my.locuszoom.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Software , Genome , Genetic Association Studies , Documentation
SELECTION OF CITATIONS
SEARCH DETAIL