RESUMO
The Database of Genotypes and Phenotypes (dbGap, http://www.ncbi.nlm.nih.gov/gap) is a National Institutes of Health-sponsored repository charged to archive, curate and distribute information produced by studies investigating the interaction of genotype and phenotype. Information in dbGaP is organized as a hierarchical structure and includes the accessioned objects, phenotypes (as variables and datasets), various molecular assay data (SNP and Expression Array data, Sequence and Epigenomic marks), analyses and documents. Publicly accessible metadata about submitted studies, summary level data, and documents related to studies can be accessed freely on the dbGaP website. Individual-level data are accessible via Controlled Access application to scientists across the globe.
Assuntos
Bases de Dados Genéticas , Genótipo , Fenótipo , Humanos , Internet , National Library of Medicine (U.S.) , Estados UnidosRESUMO
Expression quantitative trait methylation (eQTM) analysis identifies DNA CpG sites at which methylation is associated with gene expression. The present study describes an eQTM resource of CpG-transcript pairs derived from whole blood DNA methylation and RNA sequencing gene expression data in 2115 Framingham Heart Study participants. We identified 70,047 significant cis CpG-transcript pairs at p < 1E-7 where the top most significant eGenes (i.e., gene transcripts associated with a CpG) were enriched in biological pathways related to cell signaling, and for 1208 clinical traits (enrichment false discovery rate [FDR] ≤ 0.05). We also identified 246,667 significant trans CpG-transcript pairs at p < 1E-14 where the top most significant eGenes were enriched in biological pathways related to activation of the immune response, and for 1191 clinical traits (enrichment FDR ≤ 0.05). Independent and external replication of the top 1000 significant cis and trans CpG-transcript pairs was completed in the Women's Health Initiative and Jackson Heart Study cohorts. Using significant cis CpG-transcript pairs, we identified significant mediation of the association between CpG sites and cardiometabolic traits through gene expression and identified shared genetic regulation between CpGs and transcripts associated with cardiometabolic traits. In conclusion, we developed a robust and powerful resource of whole blood eQTM CpG-transcript pairs that can help inform future functional studies that seek to understand the molecular basis of disease.
Assuntos
Doenças Cardiovasculares , Metilação de DNA , Humanos , Feminino , Locos de Características Quantitativas , Regulação da Expressão Gênica , Estudos Longitudinais , Doenças Cardiovasculares/genética , Ilhas de CpG/genética , Estudo de Associação Genômica AmplaRESUMO
DNA methylation commonly occurs at cytosine-phosphate-guanine sites (CpGs) that can serve as biomarkers for many diseases. We analyzed whole genome sequencing data to identify DNA methylation quantitative trait loci (mQTLs) in 4126 Framingham Heart Study participants. Our mQTL mapping identified 94,362,817 cis-mQTLvariant-CpG pairs (for 210,156 unique autosomal CpGs) at P < 1e-7 and 33,572,145 trans-mQTL variant-CpG pairs (for 213,606 unique autosomal CpGs) at P < 1e-14. Using cis-mQTL variants for 1258 CpGs associated with seven cardiovascular disease (CVD) risk factors, we found 104 unique CpGs that colocalized with at least one CVD trait. For example, cg11554650 (PPP1R18) colocalized with type 2 diabetes, and was driven by a single nucleotide polymorphism (rs2516396). We performed Mendelian randomization (MR) analysis and demonstrated 58 putatively causal relations of CVD risk factor-associated CpGs to one or more risk factors (e.g., cg05337441 [APOB] with LDL; MR P = 1.2e-99, and 17 causal associations with coronary artery disease (e.g. cg08129017 [SREBF1] with coronary artery disease; MR P = 5e-13). We also showed that three CpGs, e.g., cg14893161 (PM20D1), are putatively causally associated with COVID-19 severity. To assist in future analyses of the role of DNA methylation in disease pathogenesis, we have posted a comprehensive summary data set in the National Heart, Lung, and Blood Institute's BioData Catalyst.
Assuntos
COVID-19 , Doença da Artéria Coronariana , Diabetes Mellitus Tipo 2 , Humanos , Metilação de DNA , Diabetes Mellitus Tipo 2/genética , Doença da Artéria Coronariana/genética , Locos de Características Quantitativas , Polimorfismo de Nucleotídeo Único , Citosina , Ilhas de CpG/genética , Estudo de Associação Genômica AmplaRESUMO
To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p < 5x10 - 8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR < 0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
RESUMO
To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p <5×10 -8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p <1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR <0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
RESUMO
To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis-eQTL variant-gene transcript (eGene) pairs at p < 5 × 10-8 (2,855,111 unique cis-eQTL variants and 15,982 unique eGenes) and 1,469,754 trans-eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans-eQTL variants and 7233 unique eGenes). In addition, 442,379 cis-eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis-eGenes are enriched for immune functions (FDR < 0.05). The cis-eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , DNA , Expressão Gênica , Locos de Características Quantitativas/genética , Análise de Sequência de RNARESUMO
BACKGROUND: Identification of single nucleotide polymorphisms (SNPs) associated with gene expression levels, known as expression quantitative trait loci (eQTLs), may improve understanding of the functional role of phenotype-associated SNPs in genome-wide association studies (GWAS). The small sample sizes of some previous eQTL studies have limited their statistical power. We conducted an eQTL investigation of microarray-based gene and exon expression levels in whole blood in a cohort of 5257 individuals, exceeding the single cohort size of previous studies by more than a factor of 2. RESULTS: We detected over 19,000 independent lead cis-eQTLs and over 6000 independent lead trans-eQTLs, targeting over 10,000 gene targets (eGenes), with a false discovery rate (FDR) < 5%. Of previously published significant GWAS SNPs, 48% are identified to be significant eQTLs in our study. Some trans-eQTLs point toward novel mechanistic explanations for the association of the SNP with the GWAS-related phenotype. We also identify 59 distinct blocks or clusters of trans-eQTLs, each targeting the expression of sets of six to 229 distinct trans-eGenes. Ten of these sets of target genes are significantly enriched for microRNA targets (FDR < 5%). Many of these clusters are associated in GWAS with multiple phenotypes. CONCLUSIONS: These findings provide insights into the molecular regulatory patterns involved in human physiology and pathophysiology. We illustrate the value of our eQTL database in the context of a recent GWAS meta-analysis of coronary artery disease and provide a list of targeted eGenes for 21 of 58 GWAS loci.