ABSTRACT
Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4-23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.
Subject(s)
Cardiovascular Diseases , Genome-Wide Association Study , Cardiovascular Diseases/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Humans , Linkage Disequilibrium , Multifactorial Inheritance , Polymorphism, Single Nucleotide/genetics , Population GroupsABSTRACT
Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.
Subject(s)
Models, Genetic , Cataract/genetics , Datasets as Topic , Diabetes Mellitus, Type 2/genetics , Gene Frequency , Genome-Wide Association Study , Glaucoma/genetics , Humans , Hypertension/genetics , Macular Degeneration/genetics , Phenotype , Polymorphism, Single NucleotideABSTRACT
Carotid artery atherosclerotic disease (CAAD) is a risk factor for stroke. We used a genome-wide association (GWAS) approach to discover genetic variants associated with CAAD in participants in the electronic Medical Records and Genomics (eMERGE) Network. We identified adult CAAD cases with unilateral or bilateral carotid artery stenosis and controls without evidence of stenosis from electronic health records at eight eMERGE sites. We performed GWAS with a model adjusting for age, sex, study site, and genetic principal components of ancestry. In eMERGE we found 1793 CAAD cases and 17,958 controls. Two loci reached genome-wide significance, on chr6 in LPA (rs10455872, odds ratio [OR] (95% confidence interval [CI]) = 1.50 (1.30-1.73), p = 2.1 × 10-8 ) and on chr7, an intergenic single nucleotide variant (SNV; rs6952610, OR (95% CI) = 1.25 (1.16-1.36), p = 4.3 × 10-8 ). The chr7 association remained significant in the presence of the LPA SNV as a covariate. The LPA SNV was also associated with coronary heart disease (CHD; 4199 cases and 11,679 controls) in this study (OR (95% CI) = 1.27 (1.13-1.43), p = 5 × 10-5 ) but the chr7 SNV was not (OR (95% CI) = 1.03 (0.97-1.09), p = .37). Both variants replicated in UK Biobank. Elevated lipoprotein(a) concentrations ([Lp(a)]) and LPA variants associated with elevated [Lp(a)] have previously been associated with CAAD and CHD, including rs10455872. With electronic health record phenotypes in eMERGE and UKB, we replicated a previously known association and identified a novel locus associated with CAAD.
Subject(s)
Carotid Stenosis , Genome-Wide Association Study , Electronic Health Records , Genetic Predisposition to Disease , Genomics , Humans , Lipoprotein(a)/genetics , Models, Genetic , Polymorphism, Single NucleotideABSTRACT
Uterine fibroids (UF) are common pelvic tumors in women, heritable, and genome-wide association studies (GWAS) have identified ~ 30 loci associated with increased risk in UF. Using summary statistics from a previously published UF GWAS performed in a non-Hispanic European Ancestry (NHW) female subset from the Electronic Medical Records and Genomics (eMERGE) Network, we constructed a polygenic risk score (PRS) for UF. UF-PRS was developed using PRSice and optimized in the separate clinical population of BioVU. PRS was validated using parallel methods of 10-fold cross-validation logistic regression and phenome-wide association study (PheWAS) in a seperate subset of eMERGE NHW females (validation set), excluding samples used in GWAS. PRSice determined pt < 0.001 and after linkage disequilibrium pruning (r2 < 0.2), 4458 variants were in the PRS which was significant (pseudo-R2 = 0.0018, p = 0.041). 10-fold cross-validation logistic regression modeling of validation set revealed the model had an area under the curve (AUC) value of 0.60 (95% confidence interval [CI] 0.58-0.62) when plotted in a receiver operator curve (ROC). PheWAS identified six phecodes associated with the PRS with the most significant phenotypes being 218 'benign neoplasm of uterus' and 218.1 'uterine leiomyoma' (p = 1.94 × 10-23, OR 1.31 [95% CI 1.26-1.37] and p = 3.50 × 10-23, OR 1.32 [95% CI 1.26-1.37]). We have developed and validated the first PRS for UF. We find our PRS has predictive ability for UF and captures genetic architecture of increased risk for UF that can be used in further studies.
Subject(s)
Genome-Wide Association Study , Leiomyoma , Female , Genetic Predisposition to Disease , Genomics , Humans , Leiomyoma/genetics , Linkage Disequilibrium , Risk FactorsABSTRACT
Phenome-wide association studies (PheWASs) have been a useful tool for testing associations between genetic variations and multiple complex traits or diagnoses. Linking PheWAS-based associations between phenotypes and a variant or a genomic region into a network provides a new way to investigate cross-phenotype associations, and it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy. We created a network of associations from one of the largest PheWASs on electronic health record (EHR)-derived phenotypes across 38,682 unrelated samples from the Geisinger's biobank; the samples were genotyped through the DiscovEHR project. We computed associations between 632,574 common variants and 541 diagnosis codes. Using these associations, we constructed a "disease-disease" network (DDN) wherein pairs of diseases were connected on the basis of shared associations with a given genetic variant. The DDN provides a landscape of intra-connections within the same disease classes, as well as inter-connections across disease classes. We identified clusters of diseases with known biological connections, such as autoimmune disorders (type 1 diabetes, rheumatoid arthritis, and multiple sclerosis) and cardiovascular disorders. Previously unreported relationships between multiple diseases were identified on the basis of genetic associations as well. The network approach applied in this study can be used to uncover interactions between diseases as a result of their shared, potentially pleiotropic SNPs. Additionally, this approach might advance clinical research and even clinical practice by accelerating our understanding of disease mechanisms on the basis of similar underlying genetic associations.
Subject(s)
Disease/genetics , Electronic Health Records , Genetic Association Studies , Phenotype , Polymorphism, Single Nucleotide/genetics , Autoimmune Diseases/genetics , Cardiovascular Diseases/genetics , Epigenomics , HumansABSTRACT
Most phenome-wide association studies (PheWASs) to date have used a small to moderate number of SNPs for association with phenotypic data. We performed a large-scale single-cohort PheWAS, using electronic health record (EHR)-derived case-control status for 541 diagnoses using International Classification of Disease version 9 (ICD-9) codes and 25 median clinical laboratory measures. We calculated associations between these diagnoses and traits with â¼630,000 common frequency SNPs with minor allele frequency > 0.01 for 38,662 individuals. In this landscape PheWAS, we explored results within diseases and traits, comparing results to those previously reported in genome-wide association studies (GWASs), as well as previously published PheWASs. We further leveraged the context of functional impact from protein-coding to regulatory regions, providing a deeper interpretation of these associations. The comprehensive nature of this PheWAS allows for novel hypothesis generation, the identification of phenotypes for further study for future phenotypic algorithm development, and identification of cross-phenotype associations.
Subject(s)
Clinical Laboratory Techniques , Electronic Health Records , Genome-Wide Association Study , International Classification of Diseases , Chromatin/genetics , DNA, Intergenic/genetics , Gene Expression Regulation , Genome, Human , Haplotypes/genetics , Humans , Molecular Sequence Annotation , Open Reading Frames/genetics , Phenotype , Reproducibility of Results , Sequence Analysis, RNAABSTRACT
Recent technological advances have expanded the breadth of available omic data, from whole-genome sequencing data, to extensive transcriptomic, methylomic and metabolomic data. A key goal of analyses of these data is the identification of effective models that predict phenotypic traits and outcomes, elucidating important biomarkers and generating important insights into the genetic underpinnings of the heritability of complex traits. There is still a need for powerful and advanced analysis strategies to fully harness the utility of these comprehensive high-throughput data, identifying true associations and reducing the number of false associations. In this Review, we explore the emerging approaches for data integration - including meta-dimensional and multi-staged analyses - which aim to deepen our understanding of the role of genetics and genomics in complex outcomes. With the use and further development of these approaches, an improved understanding of the relationship between genomic variation and human phenotypes may be revealed.
Subject(s)
Data Interpretation, Statistical , Genetic Variation , Genotype , Inheritance Patterns/physiology , Models, Biological , Phenotype , Systems Biology/methods , Humans , Meta-Analysis as TopicABSTRACT
The Electronic Medical Records and Genomics (eMERGE) network is a network of medical centers with electronic medical records linked to existing biorepository samples for genomic discovery and genomic medicine research. The network sought to unify the genetic results from 78 Illumina and Affymetrix genotype array batches from 12 contributing medical centers for joint association analysis of 83,717 human participants. In this report, we describe the imputation of eMERGE results and methods to create the unified imputed merged set of genome-wide variant genotype data. We imputed the data using the Michigan Imputation Server, which provides a missing single-nucleotide variant genotype imputation service using the minimac3 imputation algorithm with the Haplotype Reference Consortium genotype reference set. We describe the quality control and filtering steps used in the generation of this data set and suggest generalizable quality thresholds for imputation and phenotype association studies. To test the merged imputed genotype set, we replicated a previously reported chromosome 6 HLA-B herpes zoster (shingles) association and discovered a novel zoster-associated loci in an epigenetic binding site near the terminus of chromosome 3 (3p29).
Subject(s)
Electronic Health Records , Genetic Predisposition to Disease , Genome-Wide Association Study , Herpes Zoster/genetics , Algorithms , Black People/genetics , Chromosomes, Human/genetics , Female , Haplotypes/genetics , Homozygote , Humans , Male , Phenotype , Polymorphism, Single Nucleotide/genetics , Principal Component Analysis , White People/geneticsABSTRACT
With the urgency to treat patients more effectively for opioid use disorder in the midst of the opioid epidemic, a key area for precision medicine is to improve individualized medication-assisted treatment for opioid use disorder. The expansion of medication-assisted treatment is a key to reducing illicit opioid use, preventing opioid overdose deaths, and reducing the comorbidities and societal impacts of opioid use disorder. The most common medication for opioid use disorder will soon be buprenorphine. Research to date shows the successful impact of buprenorphine treatment, including the pharmacogenomics of buprenorphine response and treatment efficacy. Buprenorphine is also a promising treatment for depression and anxiety, and neonatal opioid withdrawal syndrome (NOWS). However, the rates of success with medication-assisted treatment for opioid use disorder, particularly at the beginning of treatment, still show many individuals relapsing to illicit opioid use. With the scope of the opioid crisis, there is an urgent need for expansion of buprenorphine treatment research to provide critical information for improving outcomes of opioid use disorder. Implementing the best strategies for opioid use disorder treatment is of dire urgency and will save lives.
Subject(s)
Buprenorphine/pharmacology , Buprenorphine/therapeutic use , Opioid-Related Disorders/drug therapy , Analgesics, Opioid/therapeutic use , Humans , Narcotic Antagonists , Opioid Epidemic/trends , Substance Withdrawal Syndrome/drug therapy , Treatment OutcomeABSTRACT
BACKGROUND: Polycystic ovary syndrome is the most common endocrine disorder affecting women of reproductive age. A number of criteria have been developed for clinical diagnosis of polycystic ovary syndrome, with the Rotterdam criteria being the most inclusive. Evidence suggests that polycystic ovary syndrome is significantly heritable, and previous studies have identified genetic variants associated with polycystic ovary syndrome diagnosed using different criteria. The widely adopted electronic health record system provides an opportunity to identify patients with polycystic ovary syndrome using the Rotterdam criteria for genetic studies. OBJECTIVE: To identify novel associated genetic variants under the same phenotype definition, we extracted polycystic ovary syndrome cases and unaffected controls based on the Rotterdam criteria from the electronic health records and performed a discovery-validation genome-wide association study. STUDY DESIGN: We developed a polycystic ovary syndrome phenotyping algorithm on the basis of the Rotterdam criteria and applied it to 3 electronic health record-linked biobanks to identify cases and controls for genetic study. In the discovery phase, we performed an individual genome-wide association study using the Geisinger MyCode and the Electronic Medical Records and Genomics cohorts, which were then meta-analyzed. We attempted validation of the significant association loci (P<1×10-6) in the BioVU cohort. All association analyses used logistic regression, assuming an additive genetic model, and adjusted for principal components to control for population stratification. An inverse-variance fixed-effect model was adopted for meta-analysis. In addition, we examined the top variants to evaluate their associations with each criterion in the phenotyping algorithm. We used the STRING database to characterize protein-protein interaction network. RESULTS: Using the same algorithm based on the Rotterdam criteria, we identified 2995 patients with polycystic ovary syndrome and 53,599 population controls in total (2742 cases and 51,438 controls from the discovery phase; 253 cases and 2161 controls in the validation phase). We identified 1 novel genome-wide significant variant rs17186366 (odds ratio [OR]=1.37 [1.23, 1.54], P=2.8×10-8) located near SOD2. In addition, 2 loci with suggestive association were also identified: rs113168128 (OR=1.72 [1.42, 2.10], P=5.2×10-8), an intronic variant of ERBB4 that is independent from the previously published variants, and rs144248326 (OR=2.13 [1.52, 2.86], P=8.45×10-7), a novel intronic variant in WWTR1. In the further association tests of the top 3 single-nucleotide polymorphisms with each criterion in the polycystic ovary syndrome algorithm, we found that rs17186366 (SOD2) was associated with polycystic ovaries and hyperandrogenism, whereas rs11316812 (ERBB4) and rs144248326 (WWTR1) were mainly associated with oligomenorrhea or infertility. We also validated the previously reported association with DENND1A1. Using the STRING database to characterize protein-protein interactions, we found both ERBB4 and WWTR1 can interact with YAP1, which has been previously associated with polycystic ovary syndrome. CONCLUSION: Through a discovery-validation genome-wide association study on polycystic ovary syndrome identified from electronic health records using an algorithm based on Rotterdam criteria, we identified and validated a novel genome-wide significant association with a variant near SOD2. We also identified a novel independent variant within ERBB4 and a suggestive association with WWTR1. With previously identified polycystic ovary syndrome gene YAP1, the ERBB4-YAP1-WWTR1 network suggests involvement of the epidermal growth factor receptor and the Hippo pathway in the multifactorial etiology of polycystic ovary syndrome.
Subject(s)
Polycystic Ovary Syndrome/genetics , Receptor, ErbB-4/genetics , Trans-Activators/genetics , Adaptor Proteins, Signal Transducing/metabolism , Adult , Case-Control Studies , Electronic Health Records , Female , Genome-Wide Association Study , Humans , Hyperandrogenism/genetics , Infertility, Female/genetics , Middle Aged , Oligomenorrhea/genetics , Ovarian Cysts/genetics , Polycystic Ovary Syndrome/diagnosis , Polycystic Ovary Syndrome/physiopathology , Polymorphism, Single Nucleotide , Superoxide Dismutase/genetics , Transcription Factors/metabolism , Transcriptional Coactivator with PDZ-Binding Motif Proteins , YAP-Signaling ProteinsABSTRACT
Resting-state white blood cell (WBC) count is a marker of inflammation and immune system health. There is evidence that WBC count is not fixed over time and there is heterogeneity in WBC trajectory that is associated with morbidity and mortality. Latent class mixed modeling (LCMM) is a method that can identify unobserved heterogeneity in longitudinal data and attempts to classify individuals into groups based on a linear model of repeated measurements. We applied LCMM to repeated WBC count measures derived from electronic medical records of participants of the National Human Genetics Research Institute (NHRGI) electronic MEdical Record and GEnomics (eMERGE) network study, revealing two WBC count trajectory phenotypes. Advancing these phenotypes to GWAS, we found genetic associations between trajectory class membership and regions on chromosome 1p34.3 and chromosome 11q13.4. The chromosome 1 region contains CSF3R, which encodes the granulocyte colony-stimulating factor receptor. This protein is a major factor in neutrophil stimulation and proliferation. The association on chromosome 11 contain genes RNF169 and XRRA1; both involved in the regulation of double-strand break DNA repair.
Subject(s)
Leukocyte Count/methods , Leukocytes/classification , Adult , Aged , Databases, Genetic , Electronic Health Records , Female , Genome-Wide Association Study , Humans , Latent Class Analysis , Male , Middle Aged , Phenotype , Polymorphism, Single Nucleotide/genetics , Proteins/genetics , Receptors, Colony-Stimulating Factor/genetics , Ubiquitin-Protein Ligases/geneticsABSTRACT
BACKGROUND: The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. RESULTS: We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. CONCLUSIONS: Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses.
Subject(s)
Computer Simulation/standards , Genetic Association Studies/methods , Sample Size , Humans , Models, Genetic , Research DesignABSTRACT
BACKGROUND: The alpha-adrenergic agonist phenylephrine is often used to treat hypotension during anesthesia. In clinical situations, low blood pressure may require prompt intervention by intravenous bolus or infusion. Differences in responsiveness to phenylephrine treatment are commonly observed in clinical practice. Candidate gene studies indicate genetic variants may contribute to this variable response. METHODS: Pharmacological and physiological data were retrospectively extracted from routine clinical anesthetic records. Response to phenylephrine boluses could not be reliably assessed, so infusion rates were used for analysis. Unsupervised k-means clustering was conducted on clean data containing 4130 patients based on phenylephrine infusion rate and blood pressure parameters, to identify potential phenotypic subtypes. Genome-wide association studies (GWAS) were performed against average infusion rates in two cohorts: phase I (n = 1205) and phase II (n = 329). Top genetic variants identified from the meta-analysis were further examined to see if they could differentiate subgroups identified by k-means clustering. RESULTS: Three subgroups of patients with different response to phenylephrine were clustered and characterized: resistant (high infusion rate yet low mean systolic blood pressure (SBP)), intermediate (low infusion rate and low SBP), and sensitive (low infusion rate with high SBP). Differences among clusters were tabulated to assess for possible confounding influences. Comorbidity hierarchical clustering showed the resistant group had a higher prevalence of confounding factors than the intermediate and sensitive groups although overall prevalence is below 6%. Three loci with P < 1 × 10-6 were associated with phenylephrine infusion rate. Only rs11572377 with P = 6.09 × 10-7, a 3'UTR variant of EDN2, encoding a secretory vasoconstricting peptide, could significantly differentiate resistant from sensitive groups (P = 0.015 and 0.018 for phase I and phase II) or resistant from pooled sensitive and intermediate groups (P = 0.047 and 0.018). CONCLUSIONS: Retrospective analysis of electronic anesthetic records data coupled with the genetic data identified genetic variants contributing to variable sensitivity to phenylephrine infusion during anesthesia. Although the identified top gene, EDN2, has robust biological relevance to vasoconstriction by binding to endothelin type A (ETA) receptors on arterial smooth muscle cells, further functional as well as replication studies are necessary to confirm this association.
Subject(s)
Adrenergic alpha-1 Receptor Agonists/administration & dosage , Anesthesia/adverse effects , Hypotension/chemically induced , Hypotension/genetics , Phenylephrine/administration & dosage , Adult , Blood Pressure/drug effects , Female , Genome-Wide Association Study , Humans , Infusions, Intravenous , Pregnancy , Retrospective StudiesABSTRACT
BACKGROUND: Phenome-wide association studies (PheWAS) are a high-throughput approach to evaluate comprehensive associations between genetic variants and a wide range of phenotypic measures. PheWAS has varying sample sizes for quantitative traits, and variable numbers of cases and controls for binary traits across the many phenotypes of interest, which can affect the statistical power to detect associations. The motivation of this study is to investigate the various parameters which affect the estimation of statistical power in PheWAS, including sample size, case-control ratio, minor allele frequency, and disease penetrance. RESULTS: We performed a PheWAS simulation study, where we investigated variations in statistical power based on different parameters, such as overall sample size, number of cases, case-control ratio, minor allele frequency, and disease penetrance. The simulation was performed on both binary and quantitative phenotypic measures. Our simulation on binary traits suggests that the number of cases has more impact on statistical power than the case to control ratio; also, we found that a sample size of 200 cases or more maintains the statistical power to identify associations for common variants. For quantitative traits, a sample size of 1000 or more individuals performed best in the power calculations. We focused on common genetic variants (MAF > 0.01) in this study; however, in future studies, we will be extending this effort to perform similar simulations on rare variants. CONCLUSIONS: This study provides a series of PheWAS simulation analyses that can be used to estimate statistical power for some potential scenarios. These results can be used to provide guidelines for appropriate study design for future PheWAS analyses.
Subject(s)
Computer Simulation , Disease/genetics , Genetic Association Studies , Genome-Wide Association Study , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Algorithms , HumansABSTRACT
The impact of a single genetic locus on multiple phenotypes, or pleiotropy, is an important area of research. Biological systems are dynamic complex networks, and these networks exist within and between cells. In humans, the consideration of multiple phenotypes such as physiological traits, clinical outcomes and drug response, in the context of genetic variation, can provide ways of developing a more complete understanding of the complex relationships between genetic architecture and how biological systems function in health and disease. In this article, we describe recent studies exploring the relationships between genetic loci and more than one phenotype. We also cover methodological developments incorporating pleiotropy applied to model organisms as well as humans, and discuss how stepping beyond the analysis of a single phenotype leads to a deeper understanding of complex genetic architecture.
Subject(s)
Genetic Pleiotropy , Animals , Caenorhabditis elegans/genetics , Computational Biology , Drosophila/genetics , Epistasis, Genetic , Genome-Wide Association Study , Humans , Mice , Models, Genetic , Phenotype , Quantitative Trait LociABSTRACT
BACKGROUND: High-throughput approaches are increasingly being used to identify genetic associations across multiple phenotypes simultaneously. Here, we describe a pilot analysis that considered multiple on-treatment laboratory phenotypes from antiretroviral therapy-naive patients who were randomized to initiate antiretroviral regimens in a prospective clinical trial, AIDS Clinical Trials Group protocol A5202. PARTICIPANTS AND METHODS: From among 5 9545 294 polymorphisms imputed genome-wide, we analyzed 2544, including 2124 annotated in the PharmGKB, and 420 previously associated with traits in the GWAS Catalog. We derived 774 phenotypes on the basis of context from six variables: plasma atazanavir (ATV) pharmacokinetics, plasma efavirenz (EFV) pharmacokinetics, change in the CD4+ T-cell count, HIV-1 RNA suppression, fasting low-density lipoprotein-cholesterol, and fasting triglycerides. Permutation testing assessed the likelihood of associations being by chance alone. Pleiotropy was assessed for polymorphisms with the lowest P-values. RESULTS: This analysis included 1181 patients. At P less than 1.5×10, most associations were not by chance alone. Polymorphisms with the lowest P-values for EFV pharmacokinetics (CYPB26 rs3745274), low-density lipoprotein -cholesterol (APOE rs7412), and triglyceride (APOA5 rs651821) phenotypes had been associated previously with those traits in previous studies. The association between triglycerides and rs651821 was present with ATV-containing regimens, but not with EFV-containing regimens. Polymorphisms with the lowest P-values for ATV pharmacokinetics, CD4 T-cell count, and HIV-1 RNA phenotypes had not been reported previously to be associated with that trait. CONCLUSION: Using data from a prospective HIV clinical trial, we identified expected genetic associations, potentially novel associations, and at least one context-dependent association. This study supports high-throughput strategies that simultaneously explore multiple phenotypes from clinical trials' datasets for genetic associations.
Subject(s)
Acquired Immunodeficiency Syndrome/drug therapy , Anti-Retroviral Agents/administration & dosage , Apolipoprotein A-V/genetics , Apolipoproteins E/genetics , Cytochrome P-450 CYP2B6/genetics , Polymorphism, Single Nucleotide , Acquired Immunodeficiency Syndrome/genetics , Adult , Anti-Retroviral Agents/pharmacokinetics , CD4-Positive T-Lymphocytes/cytology , Female , Humans , Lymphocyte Count , Male , Middle Aged , Pharmacogenomic Variants , Phenotype , Pilot Projects , Prospective StudiesABSTRACT
MOTIVATION: We present an update to the pathway enrichment analysis tool 'Pathway Analysis by Randomization Incorporating Structure (PARIS)' that determines aggregated association signals generated from genome-wide association study results. Pathway-based analyses highlight biological pathways associated with phenotypes. PARIS uses a unique permutation strategy to evaluate the genomic structure of interrogated pathways, through permutation testing of genomic features, thus eliminating many of the over-testing concerns arising with other pathway analysis approaches. RESULTS: We have updated PARIS to incorporate expanded pathway definitions through the incorporation of new expert knowledge from multiple database sources, through customized user provided pathways, and other improvements in user flexibility and functionality. AVAILABILITY AND IMPLEMENTATION: PARIS is freely available to all users at https://ritchielab.psu.edu/software/paris-download CONTACT: jnc43@case.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Databases, Factual , Genome-Wide Association Study , Genomics , Humans , SoftwareABSTRACT
PurposeArrhythmogenic right ventricular cardiomyopathy (ARVC) is an inherited heart disease. Clinical follow-up of incidental findings in ARVC-associated genes is recommended. We aimed to determine the prevalence of disease thus ascertained.MethodsIndividuals (n = 30,716) underwent exome sequencing. Variants in PKP2, DSG2, DSC2, DSP, JUP, TMEM43, or TGFß3 that were database-listed as pathogenic or likely pathogenic were identified and evidence-reviewed. For subjects with putative loss-of-function (pLOF) variants or variants of uncertain significance (VUS), electronic health records (EHR) were reviewed for ARVC diagnosis, diagnostic criteria, and International Classification of Diseases (ICD-9) codes.ResultsEighteen subjects had pLOF variants; none of these had an EHR diagnosis of ARVC. Of 14 patients with an electrocardiogram, one had a minor diagnostic criterion; the rest were normal. A total of 184 subjects had VUS, none of whom had an ARVC diagnosis. The proportion of subjects with VUS with major (4%) or minor (13%) electrocardiogram diagnostic criteria did not differ from that of variant-negative controls. ICD-9 codes showed no difference in defibrillator use, electrophysiologic abnormalities or nonischemic cardiomyopathies in patients with pLOF or VUSs compared with controls.ConclusionpLOF variants in an unselected cohort were not associated with ARVC phenotypes based on EHR review. The negative predictive value of EHR review remains uncertain.
Subject(s)
Arrhythmogenic Right Ventricular Dysplasia/genetics , Exome , Genetic Variation , Sequence Analysis, DNA , Adult , Arrhythmogenic Right Ventricular Dysplasia/epidemiology , Cohort Studies , Electronic Health Records , Female , Genetic Association Studies , Genotype , Humans , Male , Middle Aged , Phenotype , PrevalenceABSTRACT
We performed a Phenome-wide association study (PheWAS) utilizing diverse genotypic and phenotypic data existing across multiple populations in the National Health and Nutrition Examination Surveys (NHANES), conducted by the Centers for Disease Control and Prevention (CDC), and accessed by the Epidemiological Architecture for Genes Linked to Environment (EAGLE) study. We calculated comprehensive tests of association in Genetic NHANES using 80 SNPs and 1,008 phenotypes (grouped into 184 phenotype classes), stratified by race-ethnicity. Genetic NHANES includes three surveys (NHANES III, 1999-2000, and 2001-2002) and three race-ethnicities: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We identified 69 PheWAS associations replicating across surveys for the same SNP, phenotype-class, direction of effect, and race-ethnicity at p<0.01, allele frequency >0.01, and sample size >200. Of these 69 PheWAS associations, 39 replicated previously reported SNP-phenotype associations, 9 were related to previously reported associations, and 21 were novel associations. Fourteen results had the same direction of effect across more than one race-ethnicity: one result was novel, 11 replicated previously reported associations, and two were related to previously reported results. Thirteen SNPs showed evidence of pleiotropy. We further explored results with gene-based biological networks, contrasting the direction of effect for pleiotropic associations across phenotypes. One PheWAS result was ABCG2 missense SNP rs2231142, associated with uric acid levels in both non-Hispanic whites and Mexican Americans, protoporphyrin levels in non-Hispanic whites and Mexican Americans, and blood pressure levels in Mexican Americans. Another example was SNP rs1800588 near LIPC, significantly associated with the novel phenotypes of folate levels (Mexican Americans), vitamin E levels (non-Hispanic whites) and triglyceride levels (non-Hispanic whites), and replication for cholesterol levels. The results of this PheWAS show the utility of this approach for exposing more of the complex genetic architecture underlying multiple traits, through generating novel hypotheses for future research.