Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 363
Filter
Add more filters

Publication year range
1.
Nature ; 2024 May 20.
Article in English | MEDLINE | ID: mdl-38768635

ABSTRACT

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.

2.
Nature ; 622(7984): 784-793, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821707

ABSTRACT

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.


Subject(s)
Exome Sequencing , Genome, Human , Genotype , Hispanic or Latino , Adult , Humans , Africa/ethnology , Americas/ethnology , Europe/ethnology , Gene Frequency/genetics , Genetics, Population , Genome, Human/genetics , Genotyping Techniques , Hispanic or Latino/genetics , Homozygote , Loss of Function Mutation/genetics , Mexico , Prospective Studies
3.
Cell ; 155(1): 242-56, 2013 Sep 26.
Article in English | MEDLINE | ID: mdl-24074872

ABSTRACT

The complex network of specialized cells and molecules in the immune system has evolved to defend against pathogens, but inadvertent immune system attacks on "self" result in autoimmune disease. Both genetic regulation of immune cell levels and their relationships with autoimmunity are largely undetermined. Here, we report genetic contributions to quantitative levels of 95 cell types encompassing 272 immune traits, in a cohort of 1,629 individuals from four clustered Sardinian villages. We first estimated trait heritability, showing that it can be substantial, accounting for up to 87% of the variance (mean 41%). Next, by assessing ∼8.2 million variants that we identified and confirmed in an extended set of 2,870 individuals, 23 independent variants at 13 loci associated with at least one trait. Notably, variants at three loci (HLA, IL2RA, and SH2B3/ATXN2) overlap with known autoimmune disease associations. These results connect specific cellular phenotypes to specific genetic variants, helping to explicate their involvement in disease.


Subject(s)
Flow Cytometry/methods , Genetic Predisposition to Disease , Genome-Wide Association Study , Immune System Diseases/genetics , Polymorphism, Single Nucleotide , Humans , Phenotype
4.
Nature ; 607(7917): 97-103, 2022 07.
Article in English | MEDLINE | ID: mdl-35255492

ABSTRACT

Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2-4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes-including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)-in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease.


Subject(s)
COVID-19 , Critical Illness , Genome, Human , Host-Pathogen Interactions , Whole Genome Sequencing , ATP-Binding Cassette Transporters , COVID-19/genetics , COVID-19/mortality , COVID-19/pathology , COVID-19/virology , Cell Adhesion Molecules , Critical Care , Critical Illness/mortality , E-Selectin , Factor VIII , Fucosyltransferases , Genome, Human/genetics , Genome-Wide Association Study , Host-Pathogen Interactions/genetics , Humans , Interleukin-10 Receptor beta Subunit , Lectins, C-Type , Mucin-1 , Nerve Tissue Proteins , Phospholipid Transfer Proteins , Receptors, Cell Surface , Repressor Proteins , SARS-CoV-2/pathogenicity , Galactoside 2-alpha-L-fucosyltransferase
5.
Nature ; 612(7939): 301-309, 2022 12.
Article in English | MEDLINE | ID: mdl-36450978

ABSTRACT

Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes1-5. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes6, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.


Subject(s)
COVID-19 , Cardiovascular Diseases , Humans , Clonal Hematopoiesis/genetics , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/genetics
6.
Nature ; 599(7886): 628-634, 2021 11.
Article in English | MEDLINE | ID: mdl-34662886

ABSTRACT

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.


Subject(s)
Biological Specimen Banks , Databases, Genetic , Exome Sequencing , Exome/genetics , Africa/ethnology , Asia/ethnology , Asthma/genetics , Diabetes Mellitus/genetics , Europe/ethnology , Eye Diseases/genetics , Female , Genetic Predisposition to Disease/genetics , Genetic Variation , Genome-Wide Association Study , Humans , Hypertension/genetics , Liver Diseases/genetics , Male , Mutation , Neoplasms/genetics , Quantitative Trait, Heritable , United Kingdom
7.
Hum Mol Genet ; 33(4): 374-385, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-37934784

ABSTRACT

Genome-wide association studies have contributed extensively to the discovery of disease-associated common variants. However, the genetic contribution to complex traits is still largely difficult to interpret. We report a genome-wide association study of 2394 cases and 2393 controls for age-related macular degeneration (AMD) via whole-genome sequencing, with 46.9 million genetic variants. Our study reveals significant single-variant association signals at four loci and independent gene-based signals in CFH, C2, C3, and NRTN. Using data from the Exome Aggregation Consortium (ExAC) for a gene-based test, we demonstrate an enrichment of predicted rare loss-of-function variants in CFH, CFI, and an as-yet unreported gene in AMD, ORMDL2. Our method of using a large variant list without individual-level genotypes as an external reference provides a flexible and convenient approach to leverage the publicly available variant datasets to augment the search for rare variant associations, which can explain additional disease risk in AMD.


Subject(s)
Genome-Wide Association Study , Macular Degeneration , Humans , Genome-Wide Association Study/methods , Macular Degeneration/genetics , Genotype , Genetic Testing , Whole Genome Sequencing , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease , Complement Factor H/genetics
8.
Nature ; 586(7831): 749-756, 2020 10.
Article in English | MEDLINE | ID: mdl-33087929

ABSTRACT

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Subject(s)
Databases, Genetic , Exome Sequencing , Exome/genetics , Loss of Function Mutation/genetics , Phenotype , Aged , Bone Density/genetics , Collagen Type VI/genetics , Demography , Female , Genes, BRCA1 , Genes, BRCA2 , Genotype , Humans , Ion Channels/genetics , Male , Middle Aged , Neoplasms/genetics , Penetrance , Peptide Fragments/genetics , United Kingdom , Varicose Veins/genetics , ras GTPase-Activating Proteins/genetics
9.
Am J Hum Genet ; 109(6): 1007-1015, 2022 06 02.
Article in English | MEDLINE | ID: mdl-35508176

ABSTRACT

Genotype imputation is an integral tool in genome-wide association studies, in which it facilitates meta-analysis, increases power, and enables fine-mapping. With the increasing availability of whole-genome-sequence datasets, investigators have access to a multitude of reference-panel choices for genotype imputation. In principle, combining all sequenced whole genomes into a single large panel would provide the best imputation performance, but this is often cumbersome or impossible due to privacy restrictions. Here, we describe meta-imputation, a method that allows imputation results generated using different reference panels to be combined into a consensus imputed dataset. Our meta-imputation method requires small changes to the output of existing imputation tools to produce necessary inputs, which are then combined using dynamically estimated weights that are tailored to each individual and genome segment. In the scenarios we examined, the method consistently outperforms imputation using a single reference panel and achieves accuracy comparable to imputation using a combined reference panel.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genome , Genome-Wide Association Study/methods , Genotype , Humans , Polymorphism, Single Nucleotide/genetics , Research Design
10.
N Engl J Med ; 387(4): 332-344, 2022 07 28.
Article in English | MEDLINE | ID: mdl-35939579

ABSTRACT

BACKGROUND: Exome sequencing in hundreds of thousands of persons may enable the identification of rare protein-coding genetic variants associated with protection from human diseases like liver cirrhosis, providing a strategy for the discovery of new therapeutic targets. METHODS: We performed a multistage exome sequencing and genetic association analysis to identify genes in which rare protein-coding variants were associated with liver phenotypes. We conducted in vitro experiments to further characterize associations. RESULTS: The multistage analysis involved 542,904 persons with available data on liver aminotransferase levels, 24,944 patients with various types of liver disease, and 490,636 controls without liver disease. We found that rare coding variants in APOB, ABCB4, SLC30A10, and TM6SF2 were associated with increased aminotransferase levels and an increased risk of liver disease. We also found that variants in CIDEB, which encodes a structural protein found in hepatic lipid droplets, had a protective effect. The burden of rare predicted loss-of-function variants plus missense variants in CIDEB (combined carrier frequency, 0.7%) was associated with decreased alanine aminotransferase levels (beta per allele, -1.24 U per liter; 95% confidence interval [CI], -1.66 to -0.83; P = 4.8×10-9) and with 33% lower odds of liver disease of any cause (odds ratio per allele, 0.67; 95% CI, 0.57 to 0.79; P = 9.9×10-7). Rare coding variants in CIDEB were associated with a decreased risk of liver disease across different underlying causes and different degrees of severity, including cirrhosis of any cause (odds ratio per allele, 0.50; 95% CI, 0.36 to 0.70). Among 3599 patients who had undergone bariatric surgery, rare coding variants in CIDEB were associated with a decreased nonalcoholic fatty liver disease activity score (beta per allele in score units, -0.98; 95% CI, -1.54 to -0.41 [scores range from 0 to 8, with higher scores indicating more severe disease]). In human hepatoma cell lines challenged with oleate, CIDEB small interfering RNA knockdown prevented the buildup of large lipid droplets. CONCLUSIONS: Rare germline mutations in CIDEB conferred substantial protection from liver disease. (Funded by Regeneron Pharmaceuticals.).


Subject(s)
Apoptosis Regulatory Proteins , Germ-Line Mutation , Liver Diseases , Apoptosis Regulatory Proteins/genetics , Apoptosis Regulatory Proteins/metabolism , Genetic Predisposition to Disease/genetics , Genetic Predisposition to Disease/prevention & control , Humans , Liver/metabolism , Liver Diseases/genetics , Liver Diseases/metabolism , Liver Diseases/prevention & control , Transaminases/genetics , Exome Sequencing
12.
Genet Epidemiol ; 47(3): 231-248, 2023 04.
Article in English | MEDLINE | ID: mdl-36739617

ABSTRACT

Linkage analysis, a class of methods for detecting co-segregation of genomic segments and traits in families, was used to map disease-causing genes for decades before genotyping arrays and dense SNP genotyping enabled genome-wide association studies in population samples. Population samples often contain related individuals, but the segregation of alleles within families is rarely used because traditional linkage methods are computationally inefficient for larger datasets. Here, we describe Population Linkage, a novel application of Haseman-Elston regression as a method of moments estimator of variance components and their standard errors. We achieve additional computational efficiency by using modern methods for detection of IBD segments and variance component estimation, efficient preprocessing of input data, and minimizing redundant numerical calculations. We also refined variance component models to account for the biases in population-scale methods for IBD segment detection. We ran Population Linkage on four blood lipid traits in over 70,000 individuals from the HUNT and SardiNIA studies, successfully detecting 25 known genetic signals. One notable linkage signal that appeared in both was for low-density lipoprotein (LDL) cholesterol levels in the region near the gene APOE (LOD = 29.3, variance explained = 4.1%). This is the region where the missense variants rs7412 and rs429358, which together make up the ε2, ε3, and ε4 alleles each account for 2.4% and 0.8% of variation in circulating LDL cholesterol. Our results show the potential for linkage analysis and other large-scale applications of method of moments variance components estimation.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Humans , Phenotype , Cholesterol, LDL/genetics , Genetic Linkage , Apolipoproteins E/genetics
13.
Hum Mol Genet ; 31(3): 347-361, 2022 02 03.
Article in English | MEDLINE | ID: mdl-34553764

ABSTRACT

Platelets play a key role in thrombosis and hemostasis. Platelet count (PLT) and mean platelet volume (MPV) are highly heritable quantitative traits, with hundreds of genetic signals previously identified, mostly in European ancestry populations. We here utilize whole genome sequencing (WGS) from NHLBI's Trans-Omics for Precision Medicine initiative (TOPMed) in a large multi-ethnic sample to further explore common and rare variation contributing to PLT (n = 61 200) and MPV (n = 23 485). We identified and replicated secondary signals at MPL (rs532784633) and PECAM1 (rs73345162), both more common in African ancestry populations. We also observed rare variation in Mendelian platelet-related disorder genes influencing variation in platelet traits in TOPMed cohorts (not enriched for blood disorders). For example, association of GP9 with lower PLT and higher MPV was partly driven by a pathogenic Bernard-Soulier syndrome variant (rs5030764, p.Asn61Ser), and the signals at TUBB1 and CD36 were partly driven by loss of function variants not annotated as pathogenic in ClinVar (rs199948010 and rs571975065). However, residual signal remained for these gene-based signals after adjusting for lead variants, suggesting that additional variants in Mendelian genes with impacts in general population cohorts remain to be identified. Gene-based signals were also identified at several genome-wide association study identified loci for genes not annotated for Mendelian platelet disorders (PTPRH, TET2, CHEK2), with somatic variation driving the result at TET2. These results highlight the value of WGS in populations of diverse genetic ancestry to identify novel regulatory and coding signals, even for well-studied traits like platelet traits.


Subject(s)
Genome-Wide Association Study , Precision Medicine , Blood Platelets , Humans , National Heart, Lung, and Blood Institute (U.S.) , Phenotype , Polymorphism, Single Nucleotide , Precision Medicine/methods , United States
14.
Am J Hum Genet ; 108(10): 1836-1851, 2021 10 07.
Article in English | MEDLINE | ID: mdl-34582791

ABSTRACT

Many common and rare variants associated with hematologic traits have been discovered through imputation on large-scale reference panels. However, the majority of genome-wide association studies (GWASs) have been conducted in Europeans, and determining causal variants has proved challenging. We performed a GWAS of total leukocyte, neutrophil, lymphocyte, monocyte, eosinophil, and basophil counts generated from 109,563,748 variants in the autosomes and the X chromosome in the Trans-Omics for Precision Medicine (TOPMed) program, which included data from 61,802 individuals of diverse ancestry. We discovered and replicated 7 leukocyte trait associations, including (1) the association between a chromosome X, pseudo-autosomal region (PAR), noncoding variant located between cytokine receptor genes (CSF2RA and CLRF2) and lower eosinophil count; and (2) associations between single variants found predominantly among African Americans at the S1PR3 (9q22.1) and HBB (11p15.4) loci and monocyte and lymphocyte counts, respectively. We further provide evidence indicating that the newly discovered eosinophil-lowering chromosome X PAR variant might be associated with reduced susceptibility to common allergic diseases such as atopic dermatitis and asthma. Additionally, we found a burden of very rare FLT3 (13q12.2) variants associated with monocyte counts. Together, these results emphasize the utility of whole-genome sequencing in diverse samples in identifying associations missed by European-ancestry-driven GWASs.


Subject(s)
Asthma/epidemiology , Biomarkers/metabolism , Dermatitis, Atopic/epidemiology , Leukocytes/pathology , Polymorphism, Single Nucleotide , Pulmonary Disease, Chronic Obstructive/epidemiology , Quantitative Trait Loci , Asthma/genetics , Asthma/metabolism , Asthma/pathology , Dermatitis, Atopic/genetics , Dermatitis, Atopic/metabolism , Dermatitis, Atopic/pathology , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study , Humans , National Heart, Lung, and Blood Institute (U.S.) , Phenotype , Prognosis , Proteome/analysis , Proteome/metabolism , Pulmonary Disease, Chronic Obstructive/genetics , Pulmonary Disease, Chronic Obstructive/metabolism , Pulmonary Disease, Chronic Obstructive/pathology , United Kingdom/epidemiology , United States/epidemiology , Whole Genome Sequencing
15.
Am J Hum Genet ; 108(5): 874-893, 2021 05 06.
Article in English | MEDLINE | ID: mdl-33887194

ABSTRACT

Whole-genome sequencing (WGS), a powerful tool for detecting novel coding and non-coding disease-causing variants, has largely been applied to clinical diagnosis of inherited disorders. Here we leveraged WGS data in up to 62,653 ethnically diverse participants from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and assessed statistical association of variants with seven red blood cell (RBC) quantitative traits. We discovered 14 single variant-RBC trait associations at 12 genomic loci, which have not been reported previously. Several of the RBC trait-variant associations (RPN1, ELL2, MIDN, HBB, HBA1, PIEZO1, and G6PD) were replicated in independent GWAS datasets imputed to the TOPMed reference panel. Most of these discovered variants are rare/low frequency, and several are observed disproportionately among non-European Ancestry (African, Hispanic/Latino, or East Asian) populations. We identified a 3 bp indel p.Lys2169del (g.88717175_88717177TCT[4]) (common only in the Ashkenazi Jewish population) of PIEZO1, a gene responsible for the Mendelian red cell disorder hereditary xerocytosis (MIM: 194380), associated with higher mean corpuscular hemoglobin concentration (MCHC). In stepwise conditional analysis and in gene-based rare variant aggregated association analysis, we identified several of the variants in HBB, HBA1, TMPRSS6, and G6PD that represent the carrier state for known coding, promoter, or splice site loss-of-function variants that cause inherited RBC disorders. Finally, we applied base and nuclease editing to demonstrate that the sentinel variant rs112097551 (nearest gene RPN1) acts through a cis-regulatory element that exerts long-range control of the gene RUVBL1 which is essential for hematopoiesis. Together, these results demonstrate the utility of WGS in ethnically diverse population-based samples and gene editing for expanding knowledge of the genetic architecture of quantitative hematologic traits and suggest a continuum between complex trait and Mendelian red cell disorders.


Subject(s)
Erythrocytes/metabolism , Erythrocytes/pathology , Genome-Wide Association Study , National Heart, Lung, and Blood Institute (U.S.)/organization & administration , Phenotype , Adult , Aged , Chromosomes, Human, Pair 16/genetics , Datasets as Topic , Female , Gene Editing , Genetic Variation/genetics , HEK293 Cells , Humans , Male , Middle Aged , Quality Control , Reproducibility of Results , United States
17.
Genome Res ; 30(2): 185-194, 2020 02.
Article in English | MEDLINE | ID: mdl-31980570

ABSTRACT

Detecting and estimating DNA sample contamination are important steps to ensure high-quality genotype calls and reliable downstream analysis. Existing methods rely on population allele frequency information for accurate estimation of contamination rates. Correctly specifying population allele frequencies for each individual in early stage of sequence analysis is impractical or even impossible for large-scale sequencing centers that simultaneously process samples from multiple studies across diverse populations. On the other hand, incorrectly specified allele frequencies may result in substantial bias in estimated contamination rates. For example, we observed that existing methods often fail to identify 10% contaminated samples at a typical 3% contamination exclusion threshold when genetic ancestry is misspecified. Such an incomplete screening of contaminated samples substantially inflates the estimated rate of genotyping errors even in deeply sequenced genomes and exomes. We propose a robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample. Our method integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates. Our method can also be used for estimating genetic ancestries, similar to LASER or TRACE, but simultaneously accounting for potential contamination. We demonstrate that our method robustly estimates contamination rates and genetic ancestries across populations and contamination scenarios. We further demonstrate that, in the presence of contamination, genetic ancestry inference can be substantially biased with existing methods that ignore contamination, while our method corrects for such biases.


Subject(s)
DNA Contamination , DNA/genetics , Genotype , Genotyping Techniques/standards , Alleles , Exome/genetics , Gene Frequency/genetics , Genetics, Population , Humans , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA
18.
Bioinformatics ; 38(2): 559-561, 2022 01 03.
Article in English | MEDLINE | ID: mdl-34459872

ABSTRACT

SUMMARY: Expression quantitative trait loci (eQTLs) characterize the associations between genetic variation and gene expression to provide insights into tissue-specific gene regulation. Interactive visualization of tissue-specific eQTLs or splice QTLs (sQTLs) can facilitate our understanding of functional variants relevant to disease-related traits. However, combining the multi-dimensional nature of eQTLs/sQTLs into a concise and informative visualization is challenging. Existing QTL visualization tools provide useful ways to summarize the unprecedented scale of transcriptomic data but are not necessarily tailored to answer questions about the functional interpretations of trait-associated variants or other variants of interest. We developed FIVEx, an interactive eQTL/sQTL browser with an intuitive interface tailored to the functional interpretation of associated variants. It features the ability to navigate seamlessly between different data views while providing relevant tissue- and locus-specific information to offer users a better understanding of population-scale multi-tissue transcriptomic profiles. Our implementation of the FIVEx browser on the EBI eQTL catalogue, encompassing 16 publicly available RNA-seq studies, provides important insights for understanding potential tissue-specific regulatory mechanisms underlying trait-associated signals. AVAILABILITY AND IMPLEMENTATION: A FIVEx instance visualizing EBI eQTL catalogue data can be found at https://fivex.sph.umich.edu. Its source code is open source under an MIT license at https://github.com/statgen/fivex. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Genome-Wide Association Study/methods , Gene Expression Profiling/methods , Software , Transcriptome
19.
Nature ; 542(7640): 186-190, 2017 02 09.
Article in English | MEDLINE | ID: mdl-28146470

ABSTRACT

Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1-4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, STC2, AR and CRISPLD2), greater than ten times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (giving an increase of 1-2 centimetres per allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes that are mutated in monogenic growth disorders and highlight new biological candidates (such as ADAMTS3, IL11RA and NOX4) and pathways (such as proteoglycan and glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate-to-large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.


Subject(s)
Body Height/genetics , Gene Frequency/genetics , Genetic Variation/genetics , ADAMTS Proteins/genetics , Adult , Alleles , Cell Adhesion Molecules/genetics , Female , Genome, Human/genetics , Glycoproteins/genetics , Glycoproteins/metabolism , Glycosaminoglycans/biosynthesis , Hedgehog Proteins/genetics , Humans , Intercellular Signaling Peptides and Proteins/genetics , Intercellular Signaling Peptides and Proteins/metabolism , Interferon Regulatory Factors/genetics , Interleukin-11 Receptor alpha Subunit/genetics , Male , Multifactorial Inheritance/genetics , NADPH Oxidase 4 , NADPH Oxidases/genetics , Phenotype , Pregnancy-Associated Plasma Protein-A/metabolism , Procollagen N-Endopeptidase/genetics , Proteoglycans/biosynthesis , Proteolysis , Receptors, Androgen/genetics , Somatomedins/metabolism
20.
PLoS Genet ; 16(12): e1009060, 2020 12.
Article in English | MEDLINE | ID: mdl-33320851

ABSTRACT

Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.


Subject(s)
Genome-Wide Association Study/methods , Molecular Sequence Annotation/methods , Algorithms , Genome-Wide Association Study/standards , Humans , Molecular Sequence Annotation/standards , Polymorphism, Genetic , Quantitative Trait Loci , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL