RESUMO
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Assuntos
Variação Genética/genética , Genoma Humano/genética , Genômica , National Heart, Lung, and Blood Institute (U.S.) , Medicina de Precisão , Citocromo P-450 CYP2D6/genética , Haplótipos/genética , Heterozigoto , Humanos , Mutação INDEL , Mutação com Perda de Função , Mutagênese , Fenótipo , Polimorfismo de Nucleotídeo Único , Densidade Demográfica , Medicina de Precisão/normas , Controle de Qualidade , Tamanho da Amostra , Estados Unidos , Sequenciamento Completo do Genoma/normasRESUMO
INTRODUCTION: Genome-wide association studies (GWAS) have identified loci associated with Alzheimer's disease (AD) but did not identify specific causal genes or variants within those loci. Analysis of whole genome sequence (WGS) data, which interrogates the entire genome and captures rare variations, may identify causal variants within GWAS loci. METHODS: We performed single common variant association analysis and rare variant aggregate analyses in the pooled population (N cases = 2184, N controls = 2383) and targeted analyses in subpopulations using WGS data from the Alzheimer's Disease Sequencing Project (ADSP). The analyses were restricted to variants within 100 kb of 83 previously identified GWAS lead variants. RESULTS: Seventeen variants were significantly associated with AD within five genomic regions implicating the genes OARD1/NFYA/TREML1, JAZF1, FERMT2, and SLC24A4. KAT8 was implicated by both single variant and rare variant aggregate analyses. DISCUSSION: This study demonstrates the utility of leveraging WGS to gain insights into AD loci identified via GWAS.
Assuntos
Doença de Alzheimer , Estudo de Associação Genômica Ampla , Sequenciamento Completo do Genoma , Humanos , Doença de Alzheimer/genética , Feminino , Masculino , Predisposição Genética para Doença/genética , Idoso , Polimorfismo de Nucleotídeo Único/genética , Variação Genética/genéticaRESUMO
INTRODUCTION: Alzheimer's disease (AD) is a common disorder of the elderly that is both highly heritable and genetically heterogeneous. METHODS: We investigated the association of AD with both common variants and aggregates of rare coding and non-coding variants in 13,371 individuals of diverse ancestry with whole genome sequencing (WGS) data. RESULTS: Pooled-population analyses of all individuals identified genetic variants at apolipoprotein E (APOE) and BIN1 associated with AD (p < 5 × 10-8). Subgroup-specific analyses identified a haplotype on chromosome 14 including PSEN1 associated with AD in Hispanics, further supported by aggregate testing of rare coding and non-coding variants in the region. Common variants in LINC00320 were observed associated with AD in Black individuals (p = 1.9 × 10-9). Finally, we observed rare non-coding variants in the promoter of TOMM40 distinct of APOE in pooled-population analyses (p = 7.2 × 10-8). DISCUSSION: We observed that complementary pooled-population and subgroup-specific analyses offered unique insights into the genetic architecture of AD. HIGHLIGHTS: We determine the association of genetic variants with Alzheimer's disease (AD) using 13,371 individuals of diverse ancestry with whole genome sequencing (WGS) data. We identified genetic variants at apolipoprotein E (APOE), BIN1, PSEN1, and LINC00320 associated with AD. We observed rare non-coding variants in the promoter of TOMM40 distinct of APOE.
RESUMO
Background: Prior studies using the ADSP data examined variants within presenilin-2 ( PSEN2 ), presenilin-1 ( PSEN1 ), and amyloid precursor protein ( APP ) genes. However, previously-reported clinically-relevant variants and other predicted damaging missense (DM) variants have not been characterized in a newer release of the Alzheimer's Disease Sequencing Project (ADSP). Objective: To characterize previously-reported clinically-relevant variants and DM variants in PSEN2, PSEN1, APP within the participants from the ADSP. Methods: We identified rare variants (MAF <1%) previously-reported in PSEN2 , PSEN1, and APP in the available ADSP sample of 14,641 individuals with whole genome sequencing and 16,849 individuals with whole exome sequencing available for research-use (N total = 31,490). We additionally curated variants in these three genes from ClinVar, OMIM, and Alzforum and report carriers of variants in clinical databases as well as predicted DM variants in these genes. Results: We detected 31 previously-reported clinically-relevant variants with alternate alleles observed within the ADSP: 4 variants in PSEN2 , 25 in PSEN1 , and 2 in APP . The overall variant carrier rate for the 31 clinically-relevant variants in the ADSP was 0.3%. We observed that 79.5% of the variant carriers were cases compared to 3.9% were controls. In those with AD, the mean age of onset of AD among carriers of these clinically-relevant variants was 19.6 ± 1.4 years earlier compared with non-carriers (p-value=7.8×10 -57 ). Conclusion: A small proportion of individuals in the ADSP are carriers of a previously-reported clinically-relevant variant allele for AD and these participants have significantly earlier age of AD onset compared to non-carriers.
RESUMO
We rigorously assessed a comprehensive association testing framework for heteroplasmy, employing both simulated and real-world data. This framework employed a variant allele fraction (VAF) threshold and harnessed multiple gene-based tests for robust identification and association testing of heteroplasmy. Our simulation studies demonstrated that gene-based tests maintained an appropriate type I error rate at α=0.001. Notably, when 5% or more heteroplasmic variants within a target region were linked to an outcome, burden-extension tests (including the adaptive burden test, variable threshold burden test, and z-score weighting burden test) outperformed the sequence kernel association test (SKAT) and the original burden test. Applying this framework, we conducted association analyses on whole-blood derived heteroplasmy in 17,507 individuals of African and European ancestries (31% of African Ancestry, mean age of 62, with 58% women) with whole genome sequencing data. We performed both cohort- and ancestry-specific association analyses, followed by meta-analysis on both pooled samples and within each ancestry group. Our results suggest that mtDNA-encoded genes/regions are likely to exhibit varying rates in somatic aging, with the notably strong associations observed between heteroplasmy in the RNR1 and RNR2 genes (p<0.001) and advance aging by the Original Burden test. In contrast, SKAT identified significant associations (p<0.001) between diabetes and the aggregated effects of heteroplasmy in several protein-coding genes. Further research is warranted to validate these findings. In summary, our proposed statistical framework represents a valuable tool for facilitating association testing of heteroplasmy with disease traits in large human populations.
RESUMO
We rigorously assessed a comprehensive association testing framework for heteroplasmy, employing both simulated and real-world data. This framework employed a variant allele fraction (VAF) threshold and harnessed multiple gene-based tests for robust identification and association testing of heteroplasmy. Our simulation studies demonstrated that gene-based tests maintained an appropriate type I error rate at αâ¯=â¯0.001. Notably, when 5â¯% or more heteroplasmic variants within a target region were linked to an outcome, burden-extension tests (including the adaptive burden test, variable threshold burden test, and z-score weighting burden test) outperformed the sequence kernel association test (SKAT) and the original burden test. Applying this framework, we conducted association analyses on whole-blood derived heteroplasmy in 17,507 individuals of African and European ancestries (31â¯% of African Ancestry, mean age of 62, with 58â¯% women) with whole genome sequencing data. We performed both cohort- and ancestry-specific association analyses, followed by meta-analysis on both pooled samples and within each ancestry group. Our results suggest that mtDNA-encoded genes/regions are likely to exhibit varying rates in somatic aging, with the notably strong associations observed between heteroplasmy in the RNR1 and RNR2 genes (pâ¯<â¯0.001) and advance aging by the Original Burden test. In contrast, SKAT identified significant associations (pâ¯<â¯0.001) between diabetes and the aggregated effects of heteroplasmy in several protein-coding genes. Further research is warranted to validate these findings. In summary, our proposed statistical framework represents a valuable tool for facilitating association testing of heteroplasmy with disease traits in large human populations.
RESUMO
OBJECTIVE: To identify genetic risk factors for incident cardiovascular disease (CVD) among people with type 2 diabetes (T2D). RESEARCH DESIGN AND METHODS: We conducted a multiancestry time-to-event genome-wide association study for incident CVD among people with T2D. We also tested 204 known coronary artery disease (CAD) variants for association with incident CVD. RESULTS: Among 49,230 participants with T2D, 8,956 had incident CVD events (event rate 18.2%). We identified three novel genetic loci for incident CVD: rs147138607 (near CACNA1E/ZNF648, hazard ratio [HR] 1.23, P = 3.6 × 10-9), rs77142250 (near HS3ST1, HR 1.89, P = 9.9 × 10-9), and rs335407 (near TFB1M/NOX3, HR 1.25, P = 1.5 × 10-8). Among 204 known CAD loci, 5 were associated with incident CVD in T2D (multiple comparison-adjusted P < 0.00024, 0.05/204). A standardized polygenic score of these 204 variants was associated with incident CVD with HR 1.14 (P = 1.0 × 10-16). CONCLUSIONS: The data point to novel and known genomic regions associated with incident CVD among individuals with T2D.
Assuntos
Doenças Cardiovasculares , Diabetes Mellitus Tipo 2 , Estudo de Associação Genômica Ampla , Humanos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/epidemiologia , Diabetes Mellitus Tipo 2/complicações , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/epidemiologia , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Polimorfismo de Nucleotídeo ÚnicoRESUMO
INTRODUCTION: Genome-wide association studies (GWAS) have identified loci associated with Alzheimer's disease (AD) but did not identify specific causal genes or variants within those loci. Analysis of whole genome sequence (WGS) data, which interrogates the entire genome and captures rare variations, may identify causal variants within GWAS loci. METHODS: We performed single common variant association analysis and rare variant aggregate analyses in the pooled population (N cases=2,184, N controls=2,383) and targeted analyses in sub-populations using WGS data from the Alzheimer's Disease Sequencing Project (ADSP). The analyses were restricted to variants within 100 kb of 83 previously identified GWAS lead variants. RESULTS: Seventeen variants were significantly associated with AD within five genomic regions implicating the genes OARD1/NFYA/TREML1, JAZF1, FERMT2, and SLC24A4. KAT8 was implicated by both single variant and rare variant aggregate analyses. DISCUSSION: This study demonstrates the utility of leveraging WGS to gain insights into AD loci identified via GWAS.
RESUMO
Alzheimer's Disease (AD) is a common disorder of the elderly that is both highly heritable and genetically heterogeneous. Here, we investigated the association between AD and both common variants and aggregates of rare coding and noncoding variants in 13,371 individuals of diverse ancestry with whole genome sequence (WGS) data. Pooled-population analyses identified genetic variants in or near APOE, BIN1, and LINC00320 significantly associated with AD (p < 5×10-8). Population-specific analyses identified a haplotype on chromosome 14 including PSEN1 associated with AD in Hispanics, further supported by aggregate testing of rare coding and noncoding variants in this region. Finally, we observed suggestive associations (p < 5×10-5) of aggregates of rare coding rare variants in ABCA7 among non-Hispanic Whites (p=5.4×10-6), and rare noncoding variants in the promoter of TOMM40 distinct of APOE in pooled-population analyses (p=7.2×10-8). Complementary pooled-population and population-specific analyses offered unique insights into the genetic architecture of AD.
RESUMO
Expression quantitative trait methylation (eQTM) analysis identifies DNA CpG sites at which methylation is associated with gene expression. The present study describes an eQTM resource of CpG-transcript pairs derived from whole blood DNA methylation and RNA sequencing gene expression data in 2115 Framingham Heart Study participants. We identified 70,047 significant cis CpG-transcript pairs at p < 1E-7 where the top most significant eGenes (i.e., gene transcripts associated with a CpG) were enriched in biological pathways related to cell signaling, and for 1208 clinical traits (enrichment false discovery rate [FDR] ≤ 0.05). We also identified 246,667 significant trans CpG-transcript pairs at p < 1E-14 where the top most significant eGenes were enriched in biological pathways related to activation of the immune response, and for 1191 clinical traits (enrichment FDR ≤ 0.05). Independent and external replication of the top 1000 significant cis and trans CpG-transcript pairs was completed in the Women's Health Initiative and Jackson Heart Study cohorts. Using significant cis CpG-transcript pairs, we identified significant mediation of the association between CpG sites and cardiometabolic traits through gene expression and identified shared genetic regulation between CpGs and transcripts associated with cardiometabolic traits. In conclusion, we developed a robust and powerful resource of whole blood eQTM CpG-transcript pairs that can help inform future functional studies that seek to understand the molecular basis of disease.
Assuntos
Doenças Cardiovasculares , Metilação de DNA , Humanos , Feminino , Locos de Características Quantitativas , Regulação da Expressão Gênica , Estudos Longitudinais , Doenças Cardiovasculares/genética , Ilhas de CpG/genética , Estudo de Associação Genômica AmplaRESUMO
Background The relationship between mitochondrial DNA copy number (mtDNA CN) and cardiovascular disease remains elusive. Methods and Results We performed cross-sectional and prospective association analyses of blood-derived mtDNA CN and cardiovascular disease outcomes in 27 316 participants in 8 cohorts of multiple racial and ethnic groups with whole-genome sequencing. We also performed Mendelian randomization to explore causal relationships of mtDNA CN with coronary heart disease (CHD) and cardiometabolic risk factors (obesity, diabetes, hypertension, and hyperlipidemia). P<0.01 was used for significance. We validated most of the previously reported associations between mtDNA CN and cardiovascular disease outcomes. For example, 1-SD unit lower level of mtDNA CN was associated with 1.08 (95% CI, 1.04-1.12; P<0.001) times the hazard for developing incident CHD, adjusting for covariates. Mendelian randomization analyses showed no causal effect from a lower level of mtDNA CN to a higher CHD risk (ß=0.091; P=0.11) or in the reverse direction (ß=-0.012; P=0.076). Additional bidirectional Mendelian randomization analyses revealed that low-density lipoprotein cholesterol had a causal effect on mtDNA CN (ß=-0.084; P<0.001), but the reverse direction was not significant (P=0.059). No causal associations were observed between mtDNA CN and obesity, diabetes, and hypertension, in either direction. Multivariable Mendelian randomization analyses showed no causal effect of CHD on mtDNA CN, controlling for low-density lipoprotein cholesterol level (P=0.52), whereas there was a strong direct causal effect of higher low-density lipoprotein cholesterol on lower mtDNA CN, adjusting for CHD status (ß=-0.092; P<0.001). Conclusions Our findings indicate that high low-density lipoprotein cholesterol may underlie the complex relationships between mtDNA CN and vascular atherosclerosis.
Assuntos
Doenças Cardiovasculares , Doença das Coronárias , Diabetes Mellitus , Hipertensão , Humanos , DNA Mitocondrial/genética , Fatores de Risco , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genética , LDL-Colesterol , Variações do Número de Cópias de DNA , Estudos Transversais , Doença das Coronárias/genética , HDL-Colesterol , Hipertensão/epidemiologia , Hipertensão/genética , ObesidadeRESUMO
BACKGROUND AND OBJECTIVES: Previous studies suggest that lower mitochondrial DNA (mtDNA) copy number (CN) is associated with neurodegenerative diseases. However, whether mtDNA CN in whole blood is related to endophenotypes of Alzheimer disease (AD) and AD-related dementia (AD/ADRD) needs further investigation. We assessed the association of mtDNA CN with cognitive function and MRI measures in community-based samples of middle-aged to older adults. METHODS: We included dementia-free participants from 9 diverse community-based cohorts with whole-genome sequencing in the Trans-Omics for Precision Medicine (TOPMed) program. Circulating mtDNA CN was estimated as twice the ratio of the average coverage of mtDNA to nuclear DNA. Brain MRI markers included total brain, hippocampal, and white matter hyperintensity volumes. General cognitive function was derived from distinct cognitive domains. We performed cohort-specific association analyses of mtDNA CN with AD/ADRD endophenotypes assessed within ±5 years (i.e., cross-sectional analyses) or 5-20 years after blood draw (i.e., prospective analyses) adjusting for potential confounders. We further explored associations stratified by sex and age (<60 vs ≥60 years). Fixed-effects or sample size-weighted meta-analyses were performed to combine results. Finally, we performed mendelian randomization (MR) analyses to assess causality. RESULTS: We included up to 19,152 participants (mean age 59 years, 57% women). Higher mtDNA CN was cross-sectionally associated with better general cognitive function (ß = 0.04; 95% CI 0.02-0.06) independent of age, sex, batch effects, race/ethnicity, time between blood draw and cognitive evaluation, cohort-specific variables, and education. Additional adjustment for blood cell counts or cardiometabolic traits led to slightly attenuated results. We observed similar significant associations with cognition in prospective analyses, although of reduced magnitude. We found no significant associations between mtDNA CN and brain MRI measures in meta-analyses. MR analyses did not reveal a causal relation between mtDNA CN in blood and cognition. DISCUSSION: Higher mtDNA CN in blood is associated with better current and future general cognitive function in large and diverse communities across the United States. Although MR analyses did not support a causal role, additional research is needed to assess causality. Circulating mtDNA CN could serve nevertheless as a biomarker of current and future cognitive function in the community.
Assuntos
Doença de Alzheimer , DNA Mitocondrial , Pessoa de Meia-Idade , Humanos , Feminino , Idoso , Masculino , DNA Mitocondrial/genética , Variações do Número de Cópias de DNA , Estudos Prospectivos , Estudos Transversais , Imageamento por Ressonância Magnética , Cognição , EncéfaloRESUMO
BACKGROUND: Type 2 diabetes mellitus (T2D) confers a two- to three-fold increased risk of cardiovascular disease (CVD). However, the mechanisms underlying increased CVD risk among people with T2D are only partially understood. We hypothesized that a genetic association study among people with T2D at risk for developing incident cardiovascular complications could provide insights into molecular genetic aspects underlying CVD. METHODS: From 16 studies of the Cohorts for Heart & Aging Research in Genomic Epidemiology (CHARGE) Consortium, we conducted a multi-ancestry time-to-event genome-wide association study (GWAS) for incident CVD among people with T2D using Cox proportional hazards models. Incident CVD was defined based on a composite of coronary artery disease (CAD), stroke, and cardiovascular death that occurred at least one year after the diagnosis of T2D. Cohort-level estimated effect sizes were combined using inverse variance weighted fixed effects meta-analysis. We also tested 204 known CAD variants for association with incident CVD among patients with T2D. RESULTS: A total of 49,230 participants with T2D were included in the analyses (31,118 European ancestries and 18,112 non-European ancestries) which consisted of 8,956 incident CVD cases over a range of mean follow-up duration between 3.2 and 33.7 years (event rate 18.2%). We identified three novel, distinct genetic loci for incident CVD among individuals with T2D that reached the threshold for genome-wide significance (P<5.0×10-8): rs147138607 (intergenic variant between CACNA1E and ZNF648) with a hazard ratio (HR) 1.23, 95% confidence interval (CI) 1.15 - 1.32, P=3.6×10-9, rs11444867 (intergenic variant near HS3ST1) with HR 1.89, 95% CI 1.52 - 2.35, P=9.9×10-9, and rs335407 (intergenic variant between TFB1M and NOX3) HR 1.25, 95% CI 1.16 - 1.35, P=1.5×10-8. Among 204 known CAD loci, 32 were associated with incident CVD in people with T2D with P<0.05, and 5 were significant after Bonferroni correction (P<0.00024, 0.05/204). A polygenic score of these 204 variants was significantly associated with incident CVD with HR 1.14 (95% CI 1.12 - 1.16) per 1 standard deviation increase (P=1.0×10-16). CONCLUSIONS: The data point to novel and known genomic regions associated with incident CVD among individuals with T2D.
RESUMO
DNA methylation commonly occurs at cytosine-phosphate-guanine sites (CpGs) that can serve as biomarkers for many diseases. We analyzed whole genome sequencing data to identify DNA methylation quantitative trait loci (mQTLs) in 4126 Framingham Heart Study participants. Our mQTL mapping identified 94,362,817 cis-mQTLvariant-CpG pairs (for 210,156 unique autosomal CpGs) at P < 1e-7 and 33,572,145 trans-mQTL variant-CpG pairs (for 213,606 unique autosomal CpGs) at P < 1e-14. Using cis-mQTL variants for 1258 CpGs associated with seven cardiovascular disease (CVD) risk factors, we found 104 unique CpGs that colocalized with at least one CVD trait. For example, cg11554650 (PPP1R18) colocalized with type 2 diabetes, and was driven by a single nucleotide polymorphism (rs2516396). We performed Mendelian randomization (MR) analysis and demonstrated 58 putatively causal relations of CVD risk factor-associated CpGs to one or more risk factors (e.g., cg05337441 [APOB] with LDL; MR P = 1.2e-99, and 17 causal associations with coronary artery disease (e.g. cg08129017 [SREBF1] with coronary artery disease; MR P = 5e-13). We also showed that three CpGs, e.g., cg14893161 (PM20D1), are putatively causally associated with COVID-19 severity. To assist in future analyses of the role of DNA methylation in disease pathogenesis, we have posted a comprehensive summary data set in the National Heart, Lung, and Blood Institute's BioData Catalyst.
Assuntos
COVID-19 , Doença da Artéria Coronariana , Diabetes Mellitus Tipo 2 , Humanos , Metilação de DNA , Diabetes Mellitus Tipo 2/genética , Doença da Artéria Coronariana/genética , Locos de Características Quantitativas , Polimorfismo de Nucleotídeo Único , Citosina , Ilhas de CpG/genética , Estudo de Associação Genômica AmplaRESUMO
To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p < 5x10 - 8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR < 0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
RESUMO
To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p <5×10 -8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p <1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR <0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
RESUMO
To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis-eQTL variant-gene transcript (eGene) pairs at p < 5 × 10-8 (2,855,111 unique cis-eQTL variants and 15,982 unique eGenes) and 1,469,754 trans-eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans-eQTL variants and 7233 unique eGenes). In addition, 442,379 cis-eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis-eGenes are enriched for immune functions (FDR < 0.05). The cis-eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , DNA , Expressão Gênica , Locos de Características Quantitativas/genética , Análise de Sequência de RNARESUMO
Recognizing that family data provide unique advantage of identifying rare risk variants in genetic association studies, many cohorts with related samples have gone through whole genome sequencing in large initiatives such as the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Analyzing rare variants poses challenges for binary traits in that some genotype categories may have few or no observed events, causing bias and inflation in commonly used methods. Several methods have recently been proposed to better handle rare variants while accounting for family relationship, but their performances have not been thoroughly evaluated together. Here we compare several existing approaches including SAIGE but not limited to related samples using simulations based on the Framingham Heart Study samples and genotype data from Illumina HumanExome BeadChip where rare variants are the majority. We found that logistic regression with likelihood ratio test applied to related samples was the only approach that did not have inflated type I error rates in both single variant test (SVT) and gene-based tests, followed by Firth logistic regression that had inflation in its direction insensitive gene-based test at prevalence 0.01 only, applied to either related or unrelated samples, though theoretically logistic regression and Firth logistic regression do not account for relatedness in samples. SAIGE had inflation in SVT at prevalence 0.1 or lower and the inflation was eliminated with a minor allele count filter of 5. As for power, there was no approach that outperformed others consistently among all single variant tests and gene-based tests.
Assuntos
Genoma Humano , Modelos Genéticos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Software , Alelos , Simulação por Computador , Frequência do Gene , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Modelos Logísticos , Estudos Longitudinais , Medicina de Precisão/métodos , Medicina de Precisão/estatística & dados numéricosRESUMO
We investigated the concordance of mitochondrial DNA heteroplasmic mutations (heteroplasmies) in 6745 maternal pairs of European (EA, n = 4718 pairs) and African (AA, n = 2027 pairs) Americans in whole blood. Mother-offspring pairs displayed the highest concordance rate, followed by sibling-sibling and more distantly-related maternal pairs. The allele fractions of concordant heteroplasmies exhibited high correlation (R2 = 0.8) between paired individuals. Discordant heteroplasmies were more likely to be in coding regions, be nonsynonymous or nonsynonymous-deleterious (p < 0.001). The number of deleterious heteroplasmies was significantly correlated with advancing age (20-44, 45-64, and ≥65 years, p-trend = 0.01). One standard deviation increase in heteroplasmic burden (i.e., the number of heteroplasmies carried by an individual) was associated with 0.17 to 0.26 (p < 1e - 23) standard deviation decrease in mtDNA copy number, independent of age. White blood cell count and differential count jointly explained 0.5% to 1.3% (p ≤ 0.001) variance in heteroplasmic burden. A genome-wide association and meta-analysis identified a region at 11p11.12 (top signal rs779031139, p = 2.0e - 18, minor allele frequency = 0.38) associated with the heteroplasmic burden. However, the 11p11.12 region is adjacent to a nuclear mitochondrial DNA (NUMT) corresponding to a 542 bp area of the D-loop. This region was no longer significant after excluding heteroplasmies within the 542 bp from the heteroplasmic burden. The discovery that blood mtDNA heteroplasmies were both inherited and somatic origins and that an increase in heteroplasmic burden was strongly associated with a decrease in average number of mtDNA copy number in blood are important findings to be considered in association studies of mtDNA with disease traits.
Assuntos
População Negra/genética , DNA Mitocondrial/genética , Heteroplasmia , Mitocôndrias/genética , População Branca/genética , Estudo de Associação Genômica Ampla , Humanos , Mutação , Sequenciamento Completo do GenomaRESUMO
Platelet aggregation at the site of atherosclerotic vascular injury is the underlying pathophysiology of myocardial infarction and stroke. To build upon prior GWAS, here we report on 16 loci identified through a whole genome sequencing (WGS) approach in 3,855 NHLBI Trans-Omics for Precision Medicine (TOPMed) participants deeply phenotyped for platelet aggregation. We identify the RGS18 locus, which encodes a myeloerythroid lineage-specific regulator of G-protein signaling that co-localizes with expression quantitative trait loci (eQTL) signatures for RGS18 expression in platelets. Gene-based approaches implicate the SVEP1 gene, a known contributor of coronary artery disease risk. Sentinel variants at RGS18 and PEAR1 are associated with thrombosis risk and increased gastrointestinal bleeding risk, respectively. Our WGS findings add to previously identified GWAS loci, provide insights regarding the mechanism(s) by which genetics may influence cardiovascular disease risk, and underscore the importance of rare variant and regulatory approaches to identifying loci contributing to complex phenotypes.