ABSTRACT
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Subject(s)
Genetic Predisposition to Disease , Humans , Risk Factors , Multifactorial Inheritance , Precision Medicine , Genome-Wide Association StudyABSTRACT
Polygenic risk scores (PRSs), which often aggregate results from genome-wide association studies, can bridge the gap between initial discovery efforts and clinical applications for the estimation of disease risk using genetics. However, there is notable heterogeneity in the application and reporting of these risk scores, which hinders the translation of PRSs into clinical care. Here, in a collaboration between the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog, we present the Polygenic Risk Score Reporting Standards (PRS-RS), in which we update the Genetic Risk Prediction Studies (GRIPS) Statement to reflect the present state of the field. Drawing on the input of experts in epidemiology, statistics, disease-specific applications, implementation and policy, this comprehensive reporting framework defines the minimal information that is needed to interpret and evaluate PRSs, especially with respect to downstream clinical applications. Items span detailed descriptions of study populations, statistical methods for the development and validation of PRSs and considerations for the potential limitations of these scores. In addition, we emphasize the need for data availability and transparency, and we encourage researchers to deposit and share PRSs through the PGS Catalog to facilitate reproducibility and comparative benchmarking. By providing these criteria in a structured format that builds on existing standards and ontologies, the use of this framework in publishing PRSs will facilitate translation into clinical care and progress towards defining best practice.
Subject(s)
Genetic Predisposition to Disease , Genetics, Medical/standards , Multifactorial Inheritance/genetics , Humans , Reproducibility of Results , Risk Assessment/standardsABSTRACT
Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
Subject(s)
Communicable Diseases , Hepatitis C , Humans , Genome-Wide Association Study , Communicable Diseases/genetics , Phenotype , Hepatitis C/genetics , HepacivirusABSTRACT
Mendelian randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers or other types of traits that are co-regulated by the exposure. We propose a method for MR analysis on latent exposures (MRLE), which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of GWAS summary association statistics for the observable traits, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies show that MRLE has well-controlled type I error rates and enhanced power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α, and MCP-1) provide evidence for potential causal effects of inflammation on increasing the risk of coronary artery disease, colorectal cancer, and rheumatoid arthritis, while standard MR analysis for individual biomarkers fails to detect consistent evidence for such effects.
Subject(s)
Biomarkers , Genome-Wide Association Study , Mendelian Randomization Analysis , Mendelian Randomization Analysis/methods , Humans , Biomarkers/blood , Genome-Wide Association Study/methods , Inflammation/genetics , Models, StatisticalABSTRACT
Studies of asthma and allergy are generating increasing volumes of omics data for analysis and interpretation. The National Institute of Allergy and Infectious Diseases (NIAID) assembled a workshop comprising investigators studying asthma and allergic diseases using omics approaches, omics investigators from outside the field, and NIAID medical and scientific officers to discuss the following areas in asthma and allergy research: genomics, epigenomics, transcriptomics, microbiomics, metabolomics, proteomics, lipidomics, integrative omics, systems biology, and causal inference. Current states of the art, present challenges, novel and emerging strategies, and priorities for progress were presented and discussed for each area. This workshop report summarizes the major points and conclusions from this NIAID workshop. As a group, the investigators underscored the imperatives for rigorous analytic frameworks, integration of different omics data types, cross-disciplinary interaction, strategies for overcoming current limitations, and the overarching goal to improve scientific understanding and care of asthma and allergic diseases.
Subject(s)
Asthma , Hypersensitivity , United States , Humans , National Institute of Allergy and Infectious Diseases (U.S.) , Hypersensitivity/genetics , Asthma/etiology , Genomics , Proteomics , MetabolomicsABSTRACT
Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER- breast cancer subtypes. We also investigated educational attainment using UK Biobank data ( N = 127 , 127 $N=127,127$ ) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
Subject(s)
Breast Neoplasms , Genome-Wide Association Study , Humans , Female , Genetic Variation , Models, Genetic , Computer Simulation , Breast Neoplasms/geneticsABSTRACT
Polygenic risk scores (PRSs) are rapidly emerging as a way to measure disease risk by aggregating multiple genetic variants. Understanding the interplay of the PRS with environmental factors is critical for interpreting and applying PRSs in a wide variety of settings. We develop an efficient method for simultaneously modeling gene-environment correlations and interactions using the PRS in case-control studies. We use a logistic-normal regression modeling framework to specify the disease risk and PRS distribution in the underlying population and propose joint inference across the 2 models using the retrospective likelihood of the case-control data. Extensive simulation studies demonstrate the flexibility of the method in trading-off bias and efficiency for the estimation of various model parameters compared with standard logistic regression or a case-only analysis for gene-environment interactions, or a control-only analysis, for gene-environment correlations. Finally, using simulated case-control data sets within the UK Biobank study, we demonstrate the power of our method for its ability to recover results from the full prospective cohort for the detection of an interaction between long-term oral contraceptive use and the PRS on the risk of breast cancer. This method is computationally efficient and implemented in a user-friendly R package.
Subject(s)
Gene-Environment Interaction , Multifactorial Inheritance , Humans , Case-Control Studies , Multifactorial Inheritance/genetics , Breast Neoplasms/genetics , Female , Logistic Models , Genetic Predisposition to Disease , Computer Simulation , Risk Factors , Models, Genetic , Genetic Risk ScoreABSTRACT
BACKGROUND: Multiple novel protein biomarkers have been shown to be associated with prostate cancer risk using genetic instruments. This study aimed to externally validate the associations of 30 genetically predicted candidate proteins with prostate cancer risk using aptamer-based levels in US Black and White men in the Atherosclerosis Risk in Communities (ARIC) study. Plasma protein levels were previously measured by SomaScan® using the blood collected in 1990-1992. METHODS: Among 4864 eligible participants, we ascertained 667 first primary prostate cancer cases through 2015. Hazard ratios (HRs) of prostate cancer and 95% confidence intervals (CIs) were estimated using Cox proportional hazards regression for tertiles of each protein. We adjusted for age, race, and other risk factors. RESULTS: Of the 30 proteins and considering a nominal p trend < 0.05, two were positively associated with prostate cancer risk-RF1ML (tertile 3 vs. 1: HR = 1.23; 95% CI 1.02-1.48; p trend = 0.037) and TPST1 (1.28, 95% CI 1.06-1.55; p trend = 0.0087); two were inversely associated-ATF6A (HR = 0.80, 95% CI 0.65-0.98; p trend = 0.028) and SPINT2 (HR = 0.74, 95% CI 0.61-0.90; p trend = 0.0025). One protein, KDEL2, which was nonlinearly associated (test-for-linearity: p < 0.01) showed a statistically significant lower risk in the second tertile (HR = 0.79, 95% CI 0.65-0.95). Of these five, four proteins-ATF6A, KDEL2, RF1ML, and TPST1-were consistent in the direction of association with the discovery studies. CONCLUSION: This study validated some pre-diagnostic protein biomarkers of the risk of prostate cancer.
ABSTRACT
Physical inactivity (PA) is an important risk factor for a wide range of diseases. Previous genome-wide association studies (GWAS), based on self-reported data or a small number of phenotypes derived from accelerometry, have identified a limited number of genetic loci associated with habitual PA and provided evidence for involvement of central nervous system in mediating genetic effects. In this study, we derived 27 PA phenotypes from wrist accelerometry data obtained from 88,411 UK Biobank study participants. Single-variant association analysis based on mixed-effects models and transcriptome-wide association studies (TWAS) together identified 5 novel loci that were not detected by previous studies of PA, sleep duration and self-reported chronotype. For both novel and previously known loci, we discovered associations with novel phenotypes including active-to-sedentary transition probability, light-intensity PA, activity during different times of the day and proxy phenotypes to sleep and circadian patterns. Follow-up studies including TWAS, colocalization, tissue-specific heritability enrichment, gene-set enrichment and genetic correlation analyses indicated the role of the blood and immune system in modulating the genetic effects and a secondary role of the digestive and endocrine systems. Our findings provided important insights into the genetic architecture of PA and its underlying mechanisms.
Subject(s)
Genome-Wide Association Study , Models, Genetic , Accelerometry , Exercise/physiology , Genetic Loci , Genetic Predisposition to Disease , HumansABSTRACT
Genetically predicted proteins have been associated with pancreatic cancer risk previously. We aimed to externally validate the associations of 53 candidate proteins with pancreatic cancer risk using directly measured, prediagnostic levels. We conducted a prospective cohort study of 10 355 US Black and White men and women in the Atherosclerosis Risk in Communities (ARIC) study. Aptamer-based plasma proteomic profiling was previously performed using blood collected in 1993 to 1995, from which the proteins were selected. By 2015 (median: 20 years), 93 incident pancreatic cancer cases were ascertained. Cox regression was used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for protein tertiles, and adjust for age, race, and known risk factors. Of the 53 proteins, three were statistically significantly, positively associated with risk-GLCE (tertile 3 vs 1: HR = 1.88, 95% CI: 1.12-3.13; P-trend = 0.01), GOLM1 (aptamer 1: HR = 1.98, 95% CI: 1.16-3.37; P-trend = 0.01; aptamer 2: HR = 1.86, 95% CI: 1.07-3.24; P-trend = 0.05), and QSOX2 (HR = 1.96, 95% CI: 1.09-3.58; P-trend = 0.05); two were inversely associated-F177A (HR = 0.59, 95% CI: 0.35-1.00; P-trend = 0.05) and LIFsR (HR = 0.55, 95% CI: 0.32-0.93; P-trend = 0.03); and one showed a statistically significant lower risk in the middle tertile-endoglin (HR = 0.50, 95% CI: 0.29-0.86); by chance, we expected significant associations for 2.65 proteins. FAM3D, IP10, sTie-1 (positive); SEM6A and JAG1 (inverse) were suggestively associated with risk. Of these 11, 10 proteins-endoglin, FAM3D, F177A, GLCE, GOLM1, JAG1, LIFsR, QSOX2, SEM6A and sTie-1-were consistent in direction of association with the discovery studies. This prospective study validated or supports 10 proteins as associated with pancreatic cancer risk.
Subject(s)
Atherosclerosis , Pancreatic Neoplasms , Male , Humans , Female , Prospective Studies , Endoglin , Proteomics , Risk Factors , Atherosclerosis/epidemiology , Atherosclerosis/genetics , Pancreatic Neoplasms/epidemiology , Pancreatic Neoplasms/genetics , Biomarkers , Incidence , Proportional Hazards Models , Oxidoreductases Acting on Sulfur Group Donors , Membrane Proteins , Pancreatic NeoplasmsABSTRACT
While genome-wide association studies have identified susceptibility variants for numerous traits, their combined utility for predicting broad measures of health, such as mortality, remains poorly understood. We used data from the UK Biobank to combine polygenic risk scores (PRS) for 13 diseases and 12 mortality risk factors into sex-specific composite PRS (cPRS). These cPRS were moderately associated with all-cause mortality in independent data within the UK Biobank: the estimated hazard ratios per standard deviation were 1.10 (95% confidence interval: 1.05, 1.16) and 1.15 (1.10, 1.19) for women and men, respectively. Differences in life expectancy between the top and bottom 5% of the cPRS were estimated to be 4.79 (1.76, 7.81) years and 6.75 (4.16, 9.35) years for women and men, respectively. These associations were substantially attenuated after adjusting for non-genetic mortality risk factors measured at study entry (i.e., middle age for most participants). The cPRS may be useful in counseling younger individuals at higher genetic risk of mortality on modification of non-genetic factors.
Subject(s)
Genetic Diseases, Inborn/mortality , Genetic Predisposition to Disease , Multifactorial Inheritance/genetics , Risk Assessment/statistics & numerical data , Biological Specimen Banks , Female , Genetic Diseases, Inborn/genetics , Genetic Diseases, Inborn/pathology , Genome-Wide Association Study , Humans , Male , Middle Aged , Phenotype , Polymorphism, Single Nucleotide/genetics , Proportional Hazards Models , Risk Factors , United KingdomABSTRACT
BACKGROUND: Molecular mechanisms underlying the benefits of healthy dietary patterns are poorly understood. Identifying protein biomarkers of dietary patterns can contribute to characterizing biological pathways influenced by food intake. OBJECTIVES: This study aimed to identify protein biomarkers associated with four indexes of healthy dietary patterns: Healthy Eating Index-2015 (HEI-2015); Alternative Healthy Eating Index-2010 (AHEI-2010); DASH diet; and alternate Mediterranean Diet (aMED). METHODS: Analyses were conducted on 10,490 Black and White men and women aged 49-73 y from the ARIC study at visit 3 (1993-1995). Dietary intake data were collected using a food frequency questionnaire, and plasma proteins were quantified using an aptamer-based proteomics assay. Multivariable linear regression models were used to examine the association between 4955 proteins and dietary patterns. We performed pathway overrepresentation analysis for diet-related proteins. An independent study population from the Framingham Heart Study was used for replication analyses. RESULTS: In the multivariable-adjusted models, 282 out of 4955 proteins (5.7%) were significantly associated with at least one dietary pattern (HEI-2015: 137; AHEI-2010: 72; DASH: 254; aMED: 35; P value < 0.05/4955 = 1.01 × 10-5). There were 148 proteins that were associated with only one dietary pattern (HEI-2015: 22; AHEI-2010: 5; DASH: 121; aMED: 0), and 20 proteins were associated with all four dietary patterns. Five unique biological pathways were significantly enriched by diet-related proteins. Seven out of 20 proteins associated with all dietary patterns in the ARIC study were available for replication analyses, and 6 out of these 7 proteins were consistent in direction and significantly associated with at least 1 dietary pattern in the Framingham Heart Study (HEI-2015: 2; AHEI-2010: 4; DASH: 6; aMED: 4; P value < 0.05/7 = 7.14 × 10-3). CONCLUSIONS: A large-scale proteomic analysis identified plasma protein biomarkers that are representative of healthy dietary patterns among middle-aged and older US adult population. These protein biomarkers may be useful objective indicators of healthy dietary patterns.
Subject(s)
Atherosclerosis , Diet, Mediterranean , Male , Adult , Middle Aged , Humans , Female , Aged , Proteomics , Diet , Longitudinal Studies , Biomarkers , Blood Proteins , Atherosclerosis/epidemiologyABSTRACT
Two-phase designs can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting data set combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods, including semiparametric maximum-likelihood estimator, require the information in phase-I to be summarized into a fixed number of strata. In this paper, we describe a novel method for the analysis of two-phase studies where information from phase-I is summarized by parameters associated with a reduced logistic regression model of the disease outcome on available covariates. We then setup estimating equations for parameters associated with the desired extended logistic regression model, based on information on the reduced model parameters from phase-I and complete data available at phase-II after accounting for nonrandom sampling design. We use generalized method of moments to solve overly identified estimating equations and develop the resulting asymptotic theory for the proposed estimator. Simulation studies show that the use of reduced parametric models, as opposed to summarizing data into strata, can lead to more efficient utilization of phase-I data. An application of the proposed method is illustrated using the data from the U.S. National Wilms Tumor Study.
Subject(s)
Kidney Neoplasms , Wilms Tumor , Humans , Logistic Models , Computer Simulation , Research Design , Models, StatisticalABSTRACT
There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236).
Subject(s)
Diabetes Mellitus, Type 2/genetics , Genetic Pleiotropy , Genome-Wide Association Study/methods , Prostatic Neoplasms/genetics , Quantitative Trait Loci , Humans , Male , Models, GeneticABSTRACT
BACKGROUND: Recent clinical guidelines support intensive blood pressure treatment targets. However, observational data suggest that excessive diastolic blood pressure (DBP) lowering might increase the risk of myocardial infarction (MI), reflecting a J- or U-shaped relationship. METHODS: We analyzed 47 407 participants from 5 cohorts (median age, 60 years). First, to corroborate previous observational analyses, we used traditional statistical methods to test the shape of association between DBP and cardiovascular disease (CVD). Second, we created polygenic risk scores of DBP and systolic blood pressure and generated linear Mendelian randomization (MR) estimates for the effect of DBP on CVD. Third, using novel nonlinear MR approaches, we evaluated for nonlinearity in the genetic relationship between DBP and CVD events. Comprehensive MR interrogation of DBP required us to also model systolic blood pressure, given that the 2 are strongly correlated. RESULTS: Traditional observational analysis of our cohorts suggested a J-shaped association between DBP and MI. By contrast, linear MR analyses demonstrated an adverse effect of increasing DBP increments on CVD outcomes, including MI (MI hazard ratio, 1.07 per unit mm Hg increase in DBP; P<0.001). Furthermore, nonlinear MR analyses found no evidence for a J-shaped relationship; instead confirming that MI risk decreases consistently per unit decrease in DBP, even among individuals with low values of baseline DBP. CONCLUSIONS: In this analysis of the genetic effect of DBP, we found no evidence for a nonlinear J- or U-shaped relationship between DBP and adverse CVD outcomes; including MI.
Subject(s)
Blood Pressure/physiology , Cardiovascular Diseases/pathology , Aged , Cardiovascular Diseases/genetics , Databases, Factual , Female , Genotype , Humans , Male , Mendelian Randomization Analysis , Middle Aged , Odds Ratio , Proportional Hazards Models , Risk FactorsABSTRACT
Investigations into the causal underpinnings of disease processes can be aided by the incorporation of genetic information. Genetic studies require populations varied in both ancestry and prevalent disease in order to optimize discovery and ensure generalizability of findings to the global population. Here, we report the genetic determinants of the serum proteome in 466 African Americans with chronic kidney disease attributed to hypertension from the richly phenotyped African American Study of Kidney Disease and Hypertension (AASK) study. Using the largest aptamer-based protein profiling platform to date (6,790 proteins or protein complexes), we identified 969 genetic associations with 900 unique proteins; including 52 novel cis (local) associations and 379 novel trans (distant) associations. The genetic effects of previously published cis-protein quantitative trait loci (pQTLs) were found to be highly reproducible, and we found evidence that our novel genetic signals colocalize with gene expression and disease processes. Many trans- pQTLs were found to reflect associations mediated by the circulating cis protein, and the common trans-pQTLs are enriched for processes involving extracellular vesicles, highlighting a plausible mechanism for distal regulation of the levels of secreted proteins. Thus, our study generates a valuable resource of genetic associations linking variants to protein levels and disease in an understudied patient population to inform future studies of drug targets and physiology.
Subject(s)
Hypertension , Kidney Diseases , Humans , Quantitative Trait Loci , Black or African American/genetics , Proteome , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Hypertension/genetics , Kidney Diseases/genetics , Genetic Predisposition to DiseaseABSTRACT
Metabolomics genome wide association study (GWAS) help outline the genetic contribution to human metabolism. However, studies to date have focused on relatively healthy, population-based samples of White individuals. Here, we conducted a GWAS of 537 blood metabolites measured in the Chronic Renal Insufficiency Cohort (CRIC) Study, with separate analyses in 822 White and 687 Black study participants. Trans-ethnic meta-analysis was then applied to improve fine-mapping of potential causal variants. Mean estimated glomerular filtration rate was 44.4 and 41.5 mL/min/1.73m2 in the White and Black participants, respectively. There were 45 significant metabolite associations at 19 loci, including novel associations at PYROXD2, PHYHD1, FADS1-3, ACOT2, MYRF, FAAH, and LIPC. The strength of associations was unchanged in models additionally adjusted for estimated glomerular filtration rate and proteinuria, consistent with a direct biochemical effect of gene products on associated metabolites. At several loci, trans-ethnic meta-analysis, which leverages differences in linkage disequilibrium across populations, reduced the number and/or genomic interval spanned by potentially causal single nucleotide polymorphisms compared to fine-mapping in the White participant cohort alone. Across all validated associations, we found strong concordance in effect sizes of the potentially causal single nucleotide polymorphisms between White and Black study participants. Thus, our study identifies novel genetic determinants of blood metabolites in chronic kidney disease, demonstrates the value of diverse cohorts to improve causal inference in metabolomics GWAS, and underscores the shared genetic basis of metabolism across race.
Subject(s)
Genome-Wide Association Study , Renal Insufficiency, Chronic , Cohort Studies , Ethnicity , Female , Humans , Linkage Disequilibrium , Male , Polymorphism, Single Nucleotide , Renal Insufficiency, Chronic/geneticsABSTRACT
Cancers are routinely classified into subtypes according to various features, including histopathological characteristics and molecular markers. Previous genome-wide association studies have reported heterogeneous associations between loci and cancer subtypes. However, it is not evident what is the optimal modeling strategy for handling correlated tumor features, missing data, and increased degrees-of-freedom in the underlying tests of associations. We propose to test for genetic associations using a mixed-effect two-stage polytomous model score test (MTOP). In the first stage, a standard polytomous model is used to specify all possible subtypes defined by the cross-classification of the tumor characteristics. In the second stage, the subtype-specific case-control odds ratios are specified using a more parsimonious model based on the case-control odds ratio for a baseline subtype, and the case-case parameters associated with tumor markers. Further, to reduce the degrees-of-freedom, we specify case-case parameters for additional exploratory markers using a random-effect model. We use the Expectation-Maximization algorithm to account for missing data on tumor markers. Through simulations across a range of realistic scenarios and data from the Polish Breast Cancer Study (PBCS), we show MTOP outperforms alternative methods for identifying heterogeneous associations between risk loci and tumor subtypes. The proposed methods have been implemented in a user-friendly and high-speed R statistical package called TOP (https://github.com/andrewhaoyu/TOP).
Subject(s)
Breast Neoplasms , Genome-Wide Association Study , Breast Neoplasms/genetics , Case-Control Studies , Female , Humans , Odds Ratio , Risk FactorsABSTRACT
Knowledge of genetics and its implications for human health is rapidly evolving in accordance with recent events, such as discoveries of large numbers of disease susceptibility loci from genome-wide association studies, the US Supreme Court ruling of the non-patentability of human genes, and the development of a regulatory framework for commercial genetic tests. In anticipation of the increasing relevance of genetic testing for the assessment of disease risks, this Review provides a summary of the methodologies used for building, evaluating and applying risk prediction models that include information from genetic testing and environmental risk factors. Potential applications of models for primary and secondary disease prevention are illustrated through several case studies, and future challenges and opportunities are discussed.