Search | Nursing VHL Search Portal

1.

Significance tests for R² of out-of-sample prediction using polygenic scores.

Momin, Md Moksedul; Lee, Soohyun; Wray, Naomi R; Lee, S Hong.

Am J Hum Genet ; 110(2): 349-358, 2023 02 02.

Article in English | MEDLINE | ID: mdl-36702127

ABSTRACT

The coefficient of determination (R2) is a well-established measure to indicate the predictive ability of polygenic scores (PGSs). However, the sampling variance of R2 is rarely considered so that 95% confidence intervals (CI) are not usually reported. Moreover, when comparisons are made between PGSs based on different discovery samples, the sampling covariance of R2 is required to test the difference between them. Here, we show how to estimate the variance and covariance of R2 values to assess the 95% CI and p value of the R2 difference. We apply this approach to real data calculating PGSs in 28,880 European participants derived from UK Biobank (UKBB) and Biobank Japan (BBJ) GWAS summary statistics for cholesterol and BMI. We quantify the significantly higher predictive ability of UKBB PGSs compared to BBJ PGSs (p value 7.6e-31 for cholesterol and 1.4e-50 for BMI). A joint model of UKBB and BBJ PGSs significantly improves the predictive ability, compared to a model of UKBB PGS only (p value 3.5e-05 for cholesterol and 1.3e-28 for BMI). We also show that the predictive ability of regulatory SNPs is significantly enriched over non-regulatory SNPs for cholesterol (p value 8.9e-26 for UKBB and 3.8e-17 for BBJ). We suggest that the proposed approach (available in R package r2redux) should be used to test the statistical significance of difference between pairs of PGSs, which may help to draw a correct conclusion about the comparative predictive ability of PGSs.

Subject(s)

Multifactorial Inheritance , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study

2.

Mitigating type 1 error inflation and power loss in GxE PRS: Genotype-environment interaction in polygenic risk score models.

Jayasinghe, Dovini; Momin, Md Moksedul; Beckmann, Kerri; Hyppönen, Elina; Benyamin, Beben; Lee, S Hong.

Genet Epidemiol ; 48(2): 85-100, 2024 03.

Article in English | MEDLINE | ID: mdl-38303123

ABSTRACT

The use of polygenic risk score (PRS) models has transformed the field of genetics by enabling the prediction of complex traits and diseases based on an individual's genetic profile. However, the impact of genotype-environment interaction (GxE) on the performance and applicability of PRS models remains a crucial aspect to be explored. Currently, existing genotype-environment interaction polygenic risk score (GxE PRS) models are often inappropriately used, which can result in inflated type 1 error rates and compromised results. In this study, we propose novel GxE PRS models that jointly incorporate additive and interaction genetic effects although also including an additional quadratic term for nongenetic covariates, enhancing their robustness against model misspecification. Through extensive simulations, we demonstrate that our proposed models outperform existing models in terms of controlling type 1 error rates and enhancing statistical power. Furthermore, we apply the proposed models to real data, and report significant GxE effects. Specifically, we highlight the impact of our models on both quantitative and binary traits. For quantitative traits, we uncover the GxE modulation of genetic effects on body mass index by alcohol intake frequency. In the case of binary traits, we identify the GxE modulation of genetic effects on hypertension by waist-to-hip ratio. These findings underscore the importance of employing a robust model that effectively controls type 1 error rates, thus preventing the occurrence of spurious GxE signals. To facilitate the implementation of our approach, we have developed an innovative R software package called GxEprs, specifically designed to detect and estimate GxE effects. Overall, our study highlights the importance of accurate GxE modeling and its implications for genetic risk prediction, although providing a practical tool to support further research in this area.

Subject(s)

Gene-Environment Interaction , Genetic Risk Score , Humans , Models, Genetic , Phenotype , Risk Factors

3.

Genetic correlations of polygenic disease traits: from theory to practice.

van Rheenen, Wouter; Peyrot, Wouter J; Schork, Andrew J; Lee, S Hong; Wray, Naomi R.

Nat Rev Genet ; 20(10): 567-581, 2019 10.

Article in English | MEDLINE | ID: mdl-31171865

ABSTRACT

The genetic correlation describes the genetic relationship between two traits and can contribute to a better understanding of the shared biological pathways and/or the causality relationships between them. The rarity of large family cohorts with recorded instances of two traits, particularly disease traits, has made it difficult to estimate genetic correlations using traditional epidemiological approaches. However, advances in genomic methodologies, such as genome-wide association studies, and widespread sharing of data now allow genetic correlations to be estimated for virtually any trait pair. Here, we review the definition, estimation, interpretation and uses of genetic correlations, with a focus on applications to human disease.

Subject(s)

Disease , Genome-Wide Association Study , Humans , Models, Genetic , Multifactorial Inheritance , Phenotype

4.

Phenotypic variance partitioning by transcriptomic gene expression levels and environmental variables for anthropometric traits using GTEx data.

Jullian Fabres, Pastor; Lee, S Hong.

Genet Epidemiol ; 47(7): 465-474, 2023 10.

Article in English | MEDLINE | ID: mdl-37318147

ABSTRACT

Phenotypic variation in human is the results of genetic variation and environmental influences. Understanding the contribution of genetic and environmental components to phenotypic variation is of great interest. The variance explained by genome-wide single nucleotide polymorphisms (SNPs) typically represents a small proportion of the phenotypic variance for complex traits, which may be because the genome is only a part of the whole biological process to shape the phenotypes. In this study, we propose to partition the phenotypic variance of three anthropometric traits, using gene expression levels and environmental variables from GTEx data. We use the gene expression of four tissues that are deemed relevant for the anthropometric traits (two adipose tissues, skeletal muscle tissue and blood tissue). Additionally, we estimate the transcriptome-environment correlation that partly underlies the phenotypes of the anthropometric traits. We found that genetic factors play a significant role in determining body mass index (BMI), with the proportion of phenotypic variance explained by gene expression levels of visceral adipose tissue being 0.68 (SE = 0.06). However, we also observed that environmental factors such as age, sex, ancestry, smoking status, and drinking alcohol status have a small but significant impact (0.005, SE = 0.001). Interestingly, we found a significant negative correlation between the transcriptomic and environmental effects on BMI (transcriptome-environment correlation = -0.54, SE = 0.14), suggesting an antagonistic relationship. This implies that individuals with lower genetic profiles may be more susceptible to the effects of environmental factors on BMI, while those with higher genetic profiles may be less susceptible. We also show that the estimated transcriptomic variance varies across tissues, e.g., the gene expression levels of whole blood tissue and environmental variables explain a lower proportion of BMI phenotypic variance (0.16, SE = 0.05 and 0.04, SE = 0.004 respectively). We observed a significant positive correlation between transcriptomic and environmental effects (1.21, SE = 0.23) for this tissue. In conclusion, phenotypic variance partitioning can be done using gene expression and environmental data even with a small sample size (n = 838 from GTEx data), which can provide insights into how the transcriptomic and environmental effects contribute to the phenotypes of the anthropometric traits.

Subject(s)

Genome , Transcriptome , Humans , Phenotype , Body Mass Index , Multifactorial Inheritance , Polymorphism, Single Nucleotide

5.

R2ROC: an efficient method of comparing two or more correlated AUC from out-of-sample prediction using polygenic scores.

Momin, Md Moksedul; Wray, Naomi R; Lee, S Hong.

Hum Genet ; 2024 Jun 20.

Article in English | MEDLINE | ID: mdl-38902498

ABSTRACT

Polygenic risk scores (PRSs) enable early prediction of disease risk. Evaluating PRS performance for binary traits commonly relies on the area under the receiver operating characteristic curve (AUC). However, the widely used DeLong's method for comparative significance tests suffer from limitations, including computational time and the lack of a one-to-one mapping between test statistics based on AUC and R 2 . To overcome these limitations, we propose a novel approach that leverages the Delta method to derive the variance and covariance of AUC values, enabling a comprehensive and efficient comparative significance test. Our approach offers notable advantages over DeLong's method, including reduced computation time (up to 150-fold), making it suitable for large-scale analyses and ideal for integration into machine learning frameworks. Furthermore, our method allows for a direct one-to-one mapping between AUC and R 2 values for comparative significance tests, providing enhanced insights into the relationship between these measures and facilitating their interpretation. We validated our proposed approach through simulations and applied it to real data comparing PRSs for diabetes and coronary artery disease (CAD) prediction in a cohort of 28,880 European individuals. The PRSs were derived using genome-wide association study summary statistics from two distinct sources. Our approach enabled a comprehensive and informative comparison of the PRSs, shedding light on their respective predictive abilities for diabetes and CAD. This advancement contributes to the assessment of genetic risk factors and personalized disease prediction, supporting better healthcare decision-making.

6.

Unraveling phenotypic variance in metabolic syndrome through multi-omics.

Amente, Lamessa Dube; Mills, Natalie T; Le, Thuc Duy; Hyppönen, Elina; Lee, S Hong.

Hum Genet ; 143(1): 35-47, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38095720

ABSTRACT

Complex multi-omics effects drive the clustering of cardiometabolic risk factors, underscoring the imperative to comprehend how individual and combined omics shape phenotypic variation. Our study partitions phenotypic variance in metabolic syndrome (MetS), blood glucose (GLU), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and blood pressure through genome, transcriptome, metabolome, and exposome (i.e., lifestyle exposome) analyses. Our analysis included a cohort of 62,822 unrelated individuals with white British ancestry, sourced from the UK biobank. We employed linear mixed models to partition phenotypic variance using the restricted maximum likelihood (REML) method, implemented in MTG2 (v2.22). We initiated the analysis by individually modeling omics, followed by subsequent integration of pairwise omics in a joint model that also accounted for the covariance and interaction between omics layers. Finally, we estimated the correlations of various omics effects between the phenotypes using bivariate REML. Significant proportions of the MetS variance were attributed to distinct data sources: genome (9.47%), transcriptome (4.24%), metabolome (14.34%), and exposome (3.77%). The phenotypic variances explained by the genome, transcriptome, metabolome, and exposome ranged from 3.28% for GLU to 25.35% for HDL-C, 0% for GLU to 19.34% for HDL-C, 4.29% for systolic blood pressure (SBP) to 35.75% for TG, and 0.89% for GLU to 10.17% for HDL-C, respectively. Significant correlations were found between genomic and transcriptomic effects for TG and HDL-C. Furthermore, significant interaction effects between omics data were detected for both MetS and its components. Interestingly, significant correlation of omics effect between the phenotypes was found. This study underscores omics' roles, interaction effects, and random-effects covariance in unveiling phenotypic variation in multi-omics domains.

Subject(s)

Metabolic Syndrome , Humans , Metabolic Syndrome/genetics , Multiomics , Phenotype , Triglycerides/genetics , Cholesterol, HDL

7.

Cross-ancestry genetic architecture and prediction for cholesterol traits.

Momin, Md Moksedul; Zhou, Xuan; Hyppönen, Elina; Benyamin, Beben; Lee, S Hong.

Hum Genet ; 143(5): 635-648, 2024 May.

Article in English | MEDLINE | ID: mdl-38536467

ABSTRACT

While cholesterol is essential, a high level of cholesterol is associated with the risk of cardiovascular diseases. Genome-wide association studies (GWASs) have proven successful in identifying genetic variants that are linked to cholesterol levels, predominantly in white European populations. However, the extent to which genetic effects on cholesterol vary across different ancestries remains largely unexplored. Here, we estimate cross-ancestry genetic correlation to address questions on how genetic effects are shared across ancestries. We find significant genetic heterogeneity between ancestries for cholesterol traits. Furthermore, we demonstrate that single nucleotide polymorphisms (SNPs) with concordant effects across ancestries for cholesterol are more frequently found in regulatory regions compared to other genomic regions. Indeed, the positive genetic covariance between ancestries is mostly driven by the effects of the concordant SNPs, whereas the genetic heterogeneity is attributed to the discordant SNPs. We also show that the predictive ability of the concordant SNPs is significantly higher than the discordant SNPs in the cross-ancestry polygenic prediction. The list of concordant SNPs for cholesterol is available in GWAS Catalog. These findings have relevance for the understanding of shared genetic architecture across ancestries, contributing to the development of clinical strategies for polygenic prediction of cholesterol in cross-ancestral settings.

Subject(s)

Cholesterol , Genome-Wide Association Study , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Humans , Cholesterol/blood , Cholesterol/genetics , Multifactorial Inheritance/genetics , White People/genetics

8.

Host genetic determinants of COVID-19 susceptibility and severity: A systematic review and meta-analysis.

Eshetie, Setegn; Jullian, Pastor; Benyamin, Beben; Lee, S Hong.

Rev Med Virol ; 33(5): e2466, 2023 09.

Article in English | MEDLINE | ID: mdl-37303119

ABSTRACT

Genome-wide association studies (GWASs) have identified single nucleotide polymorphisms (SNPs) associated with susceptibility and severity of coronavirus disease 2019 (COVID-19). However, identified SNPs are inconsistent across studies, and there is no compelling consensus that COVID-19 status is determined by genetic factors. Here, we conducted a systematic review and meta-analysis to determine the effect of genetic factors on COVID-19. A random-effect meta-analysis was performed to estimate pooled odds ratios (ORs) of SNP effects, and SNP-based heritability (SNP-h2 ) of COVID-19. The analyses were performed using meta-R package, and Stata version 17. The meta-analysis included a total of 96,817 COVID-19 cases and 6,414,916 negative controls. The meta-analysis showed that a cluster of highly correlated 9 SNPs (R2 > 0.9) at 3p21.31 gene locus covering LZTFL1 and SLC6A20 genes was significantly associated with COVID-19 severity, with a pooled OR of 1.8 [1.5-2.0]. Meanwhile, another 3 SNPs (rs2531743-G, rs2271616-T, and rs73062389-A) within the locus was associated with COVID-19 susceptibility, with pooled estimates of 0.95 [0.93-0.96], 1.23 [1.19-1.27] and 1.15 [1.13-1.17], respectively. Interestingly, SNPs associated with susceptibility and SNPs associated with severity in this locus are in linkage equilibrium (R2 < 0.026). The SNP-h2 on the liability scale for severity and susceptibility was estimated at 7.6% (Se = 3.2%) and 4.6% (Se = 1.5%), respectively. Genetic factors contribute to COVID-19 susceptibility and severity. In the 3p21.31 locus, SNPs that are associated with susceptibility are not in linkage disequilibrium (LD) with SNPs that are associated with severity, indicating within-locus heterogeneity.

Subject(s)

COVID-19 , Genetic Predisposition to Disease , Humans , Genome-Wide Association Study , COVID-19/genetics , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Membrane Transport Proteins/genetics

9.

Integrative analysis of genomic and exposomic influences on youth mental health.

Choi, Karmel W; Wilson, Marina; Ge, Tian; Kandola, Aaron; Patel, Chirag J; Lee, S Hong; Smoller, Jordan W.

J Child Psychol Psychiatry ; 63(10): 1196-1205, 2022 10.

Article in English | MEDLINE | ID: mdl-35946823

ABSTRACT

BACKGROUND: Understanding complex influences on mental health problems in young people is needed to inform early prevention strategies. Both genetic and environmental factors are known to influence youth mental health, but a more comprehensive picture of their interplay, including wide-ranging environmental exposures - that is, the exposome - is needed. We perform an integrative analysis of genomic and exposomic data in relation to internalizing and externalizing symptoms in a cohort of 4,314 unrelated youth from the Adolescent Brain and Cognitive Development (ABCD) Study. METHODS: Using novel GREML-based approaches, we model the variance in internalizing and externalizing symptoms explained by additive and interactive influences from the genome (G) and modeled exposome (E) consisting of up to 133 variables at the family, peer, school, neighborhood, life event, and broader environmental levels, including genome-by-exposome (G × E) and exposome-by-exposome (E × E) effects. RESULTS: A best-fitting integrative model with G, E, and G × E components explained 35% and 63% of variance in youth internalizing and externalizing symptoms, respectively. Youth in the top quintile of model-predicted risk accounted for the majority of individuals with clinically elevated symptoms at follow-up (60% for internalizing; 72% for externalizing). Of note, different domains of environmental exposures were most impactful for internalizing (life events) and externalizing (contextual including family, school, and peer-level factors) symptoms. In addition, variance explained by G × E contributions was substantially larger for externalizing (33%) than internalizing (13%) symptoms. CONCLUSIONS: Advanced statistical genetic methods in a longitudinal cohort of youth can be leveraged to address fundamental questions about the role of 'nature and nurture' in developmental psychopathology.

Subject(s)

Mental Health , Psychopathology , Adolescent , Genomics , Humans , Schools

10.

Estimation of Genetic Correlation via Linkage Disequilibrium Score Regression and Genomic Restricted Maximum Likelihood.

Ni, Guiyan; Moser, Gerhard; Wray, Naomi R; Lee, S Hong.

Am J Hum Genet ; 102(6): 1185-1194, 2018 06 07.

Article in English | MEDLINE | ID: mdl-29754766

ABSTRACT

Genetic correlation is a key population parameter that describes the shared genetic architecture of complex traits and diseases. It can be estimated by current state-of-art methods, i.e., linkage disequilibrium score regression (LDSC) and genomic restricted maximum likelihood (GREML). The massively reduced computing burden of LDSC compared to GREML makes it an attractive tool, although the accuracy (i.e., magnitude of standard errors) of LDSC estimates has not been thoroughly studied. In simulation, we show that the accuracy of GREML is generally higher than that of LDSC. When there is genetic heterogeneity between the actual sample and reference data from which LD scores are estimated, the accuracy of LDSC decreases further. In real data analyses estimating the genetic correlation between schizophrenia (SCZ) and body mass index, we show that GREML estimates based on â¼150,000 individuals give a higher accuracy than LDSC estimates based on â¼400,000 individuals (from combined meta-data). A GREML genomic partitioning analysis reveals that the genetic correlation between SCZ and height is significantly negative for regulatory regions, which whole genome or LDSC approach has less power to detect. We conclude that LDSC estimates should be carefully interpreted as there can be uncertainty about homogeneity among combined meta-datasets. We suggest that any interesting findings from massive LDSC analysis for a large number of complex traits should be followed up, where possible, with more detailed analyses with GREML methods, even if sample sizes are lesser.

Subject(s)

Genome, Human , Linkage Disequilibrium/genetics , Adult , Body Height/genetics , Computer Simulation , Databases, Genetic , Genotype , Haplotypes/genetics , Humans , Inheritance Patterns/genetics , Likelihood Functions , Phenotype , Polymorphism, Single Nucleotide/genetics , Regression Analysis , Schizophrenia/genetics

11.

Adiposity and cancer: a Mendelian randomization analysis in the UK biobank.

Ahmed, Muktar; Mulugeta, Anwar; Lee, S Hong; Mäkinen, Ville-Petteri; Boyle, Terry; Hyppönen, Elina.

Int J Obes (Lond) ; 45(12): 2657-2665, 2021 12.

Article in English | MEDLINE | ID: mdl-34453097

ABSTRACT

BACKGROUND: Observational and Mendelian randomization (MR) studies link obesity and cancer, but it remains unclear whether these depend upon related metabolic abnormalities. METHODS: We used information from 321,472 participants in the UK biobank, including 30,561 cases of obesity-related cancer. We constructed three genetic instruments reflecting higher adiposity together with either "unfavourable" (82 SNPs), "favourable" (24 SNPs) or "neutral" metabolic profile (25 SNPs). We looked at associations with 14 types of cancer, previously suggested to be associated with obesity. RESULTS: All genetic instruments had a strong association with BMI (p < 1 × 10-300 for all). The instrument reflecting unfavourable adiposity was also associated with higher CRP, HbA1c and adverse lipid profile, while instrument reflecting metabolically favourable adiposity was associated with lower HbA1c and a favourable lipid profile. In MR-inverse-variance weighted analysis unfavourable adiposity was associated with an increased risk of non-hormonal cancers (OR = 1.22, 95% confidence interval [CI]:1.08, 1.38), but a lower risk of hormonal cancers (OR = 0.80, 95%CI: 0.72, 0.89). From individual cancers, MR analyses suggested causal increases in the risk of multiple myeloma (OR = 1.36, 95%CI: 1.09, 1.70) and endometrial cancer (OR = 1.77, 95%CI: 1.16, 2.68) by greater genetically instrumented unfavourable adiposity but lower risks of breast and prostate cancer (OR = 0.72, 95%CI: 0.61, 0.83 and OR = 0.81, 95%CI: 0.68, 0.97, respectively). Favourable or neutral adiposity were not associated with the odds of any individual cancer. CONCLUSIONS: Higher adiposity associated with a higher risk of non-hormonal cancer but a lower risk of some hormone related cancers. Presence of metabolic abnormalities might aggravate the adverse effects of higher adiposity on cancer. Further studies are warranted to investigate whether interventions on adverse metabolic health may help to alleviate obesity-related cancer risk.

Subject(s)

Neoplasms/diagnosis , Overweight/diagnosis , Adolescent , Adult , Cohort Studies , Female , Humans , Male , Mendelian Randomization Analysis/methods , Middle Aged , Neoplasms/epidemiology , Overweight/epidemiology , Retrospective Studies , United Kingdom/epidemiology

12.

RICOPILI: Rapid Imputation for COnsortias PIpeLIne.

Lam, Max; Awasthi, Swapnil; Watson, Hunna J; Goldstein, Jackie; Panagiotaropoulou, Georgia; Trubetskoy, Vassily; Karlsson, Robert; Frei, Oleksander; Fan, Chun-Chieh; De Witte, Ward; Mota, Nina R; Mullins, Niamh; Brügger, Kim; Lee, S Hong; Wray, Naomi R; Skarabis, Nora; Huang, Hailiang; Neale, Benjamin; Daly, Mark J; Mattheisen, Manuel; Walters, Raymond; Ripke, Stephan.

Bioinformatics ; 36(3): 930-933, 2020 02 01.

Article in English | MEDLINE | ID: mdl-31393554

ABSTRACT

SUMMARY: Genome-wide association study (GWAS) analyses, at sufficient sample sizes and power, have successfully revealed biological insights for several complex traits. RICOPILI, an open-sourced Perl-based pipeline was developed to address the challenges of rapidly processing large-scale multi-cohort GWAS studies including quality control (QC), imputation and downstream analyses. The pipeline is computationally efficient with portability to a wide range of high-performance computing environments. RICOPILI was created as the Psychiatric Genomics Consortium pipeline for GWAS and adopted by other users. The pipeline features (i) technical and genomic QC in case-control and trio cohorts, (ii) genome-wide phasing and imputation, (iv) association analysis, (v) meta-analysis, (vi) polygenic risk scoring and (vii) replication analysis. Notably, a major differentiator from other GWAS pipelines, RICOPILI leverages on automated parallelization and cluster job management approaches for rapid production of imputed genome-wide data. A comprehensive meta-analysis of simulated GWAS data has been incorporated demonstrating each step of the pipeline. This includes all the associated visualization plots, to allow ease of data interpretation and manuscript preparation. Simulated GWAS datasets are also packaged with the pipeline for user training tutorials and developer work. AVAILABILITY AND IMPLEMENTATION: RICOPILI has a flexible architecture to allow for ongoing development and incorporation of newer available algorithms and is adaptable to various HPC environments (QSUB, BSUB, SLURM and others). Specific links for genomic resources are either directly provided in this paper or via tutorials and external links. The central location hosting scripts and tutorials is found at this URL: https://sites.google.com/a/broadinstitute.org/RICOPILI/home. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genome-Wide Association Study , Software , Algorithms , Genome , Genomics

13.

Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder.

Maier, Robert; Moser, Gerhard; Chen, Guo-Bo; Ripke, Stephan; Coryell, William; Potash, James B; Scheftner, William A; Shi, Jianxin; Weissman, Myrna M; Hultman, Christina M; Landén, Mikael; Levinson, Douglas F; Kendler, Kenneth S; Smoller, Jordan W; Wray, Naomi R; Lee, S Hong.

Am J Hum Genet ; 96(2): 283-94, 2015 Feb 05.

Article in English | MEDLINE | ID: mdl-25640677

ABSTRACT

Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk.

Subject(s)

Genetics, Medical/methods , Mental Disorders/genetics , Multifactorial Inheritance/genetics , Risk Assessment/methods , Bipolar Disorder/genetics , Depressive Disorder, Major/genetics , Genetic Testing/methods , Humans , Linear Models , Multivariate Analysis , Polymorphism, Single Nucleotide/genetics , Schizophrenia/genetics

14.

Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases.

Gusev, Alexander; Lee, S Hong; Trynka, Gosia; Finucane, Hilary; Vilhjálmsson, Bjarni J; Xu, Han; Zang, Chongzhi; Ripke, Stephan; Bulik-Sullivan, Brendan; Stahl, Eli; Kähler, Anna K; Hultman, Christina M; Purcell, Shaun M; McCarroll, Steven A; Daly, Mark; Pasaniuc, Bogdan; Sullivan, Patrick F; Neale, Benjamin M; Wray, Naomi R; Raychaudhuri, Soumya; Price, Alkes L.

Am J Hum Genet ; 95(5): 535-52, 2014 Nov 06.

Article in English | MEDLINE | ID: mdl-25439723

ABSTRACT

Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg(2)) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg(2) from imputed SNPs (5.1× enrichment; p = 3.7 × 10(-17)) and 38% (SE = 4%) of hg(2) from genotyped SNPs (1.6× enrichment, p = 1.0 × 10(-4)). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg(2) despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease.

Subject(s)

Genetic Diseases, Inborn/genetics , Genetic Variation/genetics , Genome-Wide Association Study/methods , Inheritance Patterns/genetics , Open Reading Frames/genetics , Regulatory Elements, Transcriptional/genetics , Computer Simulation , Humans , Models, Genetic

15.

Additive genetic variation in schizophrenia risk is shared by populations of African and European descent.

de Candia, Teresa R; Lee, S Hong; Yang, Jian; Browning, Brian L; Gejman, Pablo V; Levinson, Douglas F; Mowry, Bryan J; Hewitt, John K; Goddard, Michael E; O'Donovan, Michael C; Purcell, Shaun M; Posthuma, Danielle; Visscher, Peter M; Wray, Naomi R; Keller, Matthew C.

Am J Hum Genet ; 93(3): 463-70, 2013 Sep 05.

Article in English | MEDLINE | ID: mdl-23954163

ABSTRACT

To investigate the extent to which the proportion of schizophrenia's additive genetic variation tagged by SNPs is shared by populations of European and African descent, we analyzed the largest combined African descent (AD [n = 2,142]) and European descent (ED [n = 4,990]) schizophrenia case-control genome-wide association study (GWAS) data set available, the Molecular Genetics of Schizophrenia (MGS) data set. We show how a method that uses genomic similarities at measured SNPs to estimate the additive genetic correlation (SNP correlation [SNP-rg]) between traits can be extended to estimate SNP-rg for the same trait between ethnicities. We estimated SNP-rg for schizophrenia between the MGS ED and MGS AD samples to be 0.66 (SE = 0.23), which is significantly different from 0 (p(SNP-rg = 0) = 0.0003), but not 1 (p(SNP-rg = 1) = 0.26). We re-estimated SNP-rg between an independent ED data set (n = 6,665) and the MGS AD sample to be 0.61 (SE = 0.21, p(SNP-rg = 0) = 0.0003, p(SNP-rg = 1) = 0.16). These results suggest that many schizophrenia risk alleles are shared across ethnic groups and predate African-European divergence.

Subject(s)

Black People/genetics , Genealogy and Heraldry , Genetic Predisposition to Disease , Genetic Variation , Genetics, Population , Schizophrenia/genetics , White People/genetics , Africa/ethnology , Cohort Studies , Europe/ethnology , Gene Frequency/genetics , Humans , Inheritance Patterns/genetics , Models, Genetic , Polymorphism, Single Nucleotide/genetics , Recombination, Genetic/genetics , Risk Factors

16.

Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture.

Davis, Lea K; Yu, Dongmei; Keenan, Clare L; Gamazon, Eric R; Konkashbaev, Anuar I; Derks, Eske M; Neale, Benjamin M; Yang, Jian; Lee, S Hong; Evans, Patrick; Barr, Cathy L; Bellodi, Laura; Benarroch, Fortu; Berrio, Gabriel Bedoya; Bienvenu, Oscar J; Bloch, Michael H; Blom, Rianne M; Bruun, Ruth D; Budman, Cathy L; Camarena, Beatriz; Campbell, Desmond; Cappi, Carolina; Cardona Silgado, Julio C; Cath, Danielle C; Cavallini, Maria C; Chavira, Denise A; Chouinard, Sylvain; Conti, David V; Cook, Edwin H; Coric, Vladimir; Cullen, Bernadette A; Deforce, Dieter; Delorme, Richard; Dion, Yves; Edlund, Christopher K; Egberts, Karin; Falkai, Peter; Fernandez, Thomas V; Gallagher, Patience J; Garrido, Helena; Geller, Daniel; Girard, Simon L; Grabe, Hans J; Grados, Marco A; Greenberg, Benjamin D; Gross-Tsur, Varda; Haddad, Stephen; Heiman, Gary A; Hemmings, Sian M J; Hounie, Ana G.

PLoS Genet ; 9(10): e1003864, 2013 Oct.

Article in English | MEDLINE | ID: mdl-24204291

ABSTRACT

The direct estimation of heritability from genome-wide common variant data as implemented in the program Genome-wide Complex Trait Analysis (GCTA) has provided a means to quantify heritability attributable to all interrogated variants. We have quantified the variance in liability to disease explained by all SNPs for two phenotypically-related neurobehavioral disorders, obsessive-compulsive disorder (OCD) and Tourette Syndrome (TS), using GCTA. Our analysis yielded a heritability point estimate of 0.58 (se = 0.09, p = 5.64e-12) for TS, and 0.37 (se = 0.07, p = 1.5e-07) for OCD. In addition, we conducted multiple genomic partitioning analyses to identify genomic elements that concentrate this heritability. We examined genomic architectures of TS and OCD by chromosome, MAF bin, and functional annotations. In addition, we assessed heritability for early onset and adult onset OCD. Among other notable results, we found that SNPs with a minor allele frequency of less than 5% accounted for 21% of the TS heritability and 0% of the OCD heritability. Additionally, we identified a significant contribution to TS and OCD heritability by variants significantly associated with gene expression in two regions of the brain (parietal cortex and cerebellum) for which we had available expression quantitative trait loci (eQTLs). Finally we analyzed the genetic correlation between TS and OCD, revealing a genetic correlation of 0.41 (se = 0.15, p = 0.002). These results are very close to previous heritability estimates for TS and OCD based on twin and family studies, suggesting that very little, if any, heritability is truly missing (i.e., unassayed) from TS and OCD GWAS studies of common variation. The results also indicate that there is some genetic overlap between these two phenotypically-related neuropsychiatric disorders, but suggest that the two disorders have distinct genetic architectures.

Subject(s)

Obsessive-Compulsive Disorder/genetics , Quantitative Trait, Heritable , Tourette Syndrome/genetics , Gene Frequency , Genome-Wide Association Study , Humans , Obsessive-Compulsive Disorder/pathology , Phenotype , Polymorphism, Single Nucleotide , Tourette Syndrome/pathology

17.

Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis.

Lee, S Hong; Harold, Denise; Nyholt, Dale R; Goddard, Michael E; Zondervan, Krina T; Williams, Julie; Montgomery, Grant W; Wray, Naomi R; Visscher, Peter M.

Hum Mol Genet ; 22(4): 832-41, 2013 Feb 15.

Article in English | MEDLINE | ID: mdl-23193196

ABSTRACT

Common diseases such as endometriosis (ED), Alzheimer's disease (AD) and multiple sclerosis (MS) account for a significant proportion of the health care burden in many countries. Genome-wide association studies (GWASs) for these diseases have identified a number of individual genetic variants contributing to the risk of those diseases. However, the effect size for most variants is small and collectively the known variants explain only a small proportion of the estimated heritability. We used a linear mixed model to fit all single nucleotide polymorphisms (SNPs) simultaneously, and estimated genetic variances on the liability scale using SNPs from GWASs in unrelated individuals for these three diseases. For each of the three diseases, case and control samples were not all genotyped in the same laboratory. We demonstrate that a careful analysis can obtain robust estimates, but also that insufficient quality control (QC) of SNPs can lead to spurious results and that too stringent QC is likely to remove real genetic signals. Our estimates show that common SNPs on commercially available genotyping chips capture significant variation contributing to liability for all three diseases. The estimated proportion of total variation tagged by all SNPs was 0.26 (SE 0.04) for ED, 0.24 (SE 0.03) for AD and 0.30 (SE 0.03) for MS. Further, we partitioned the genetic variance explained into five categories by a minor allele frequency (MAF), by chromosomes and gene annotation. We provide strong evidence that a substantial proportion of variation in liability is explained by common SNPs, and thereby give insights into the genetic architecture of the diseases.

Subject(s)

Alzheimer Disease/genetics , Endometriosis/genetics , Models, Genetic , Multiple Sclerosis/genetics , Polymorphism, Single Nucleotide , Case-Control Studies , Chromosomes, Human , Female , Gene Frequency , Genetic Variation , Genotype , Humans , Male , Molecular Sequence Annotation , Multifactorial Inheritance

18.

Heterogeneity of genetic architecture of body size traits in a free-living population.

Bérénos, Camillo; Ellis, Philip A; Pilkington, Jill G; Lee, S Hong; Gratten, Jake; Pemberton, Josephine M.

Mol Ecol ; 24(8): 1810-30, 2015 Apr.

Article in English | MEDLINE | ID: mdl-25753777

ABSTRACT

Knowledge of the underlying genetic architecture of quantitative traits could aid in understanding how they evolve. In wild populations, it is still largely unknown whether complex traits are polygenic or influenced by few loci with major effect, due to often small sample sizes and low resolution of marker panels. Here, we examine the genetic architecture of five adult body size traits in a free-living population of Soay sheep on St Kilda using 37 037 polymorphic SNPs. Two traits (jaw and weight) show classical signs of a polygenic trait: the proportion of variance explained by a chromosome was proportional to its length, multiple chromosomes and genomic regions explained significant amounts of phenotypic variance, but no SNPs were associated with trait variance when using GWAS. In comparison, genetic variance for leg length traits (foreleg, hindleg and metacarpal) was disproportionately explained by two SNPs on chromosomes 16 (s23172.1) and 19 (s74894.1), which each explained >10% of the additive genetic variance. After controlling for environmental differences, females heterozygous for s74894.1 produced more lambs and recruits during their lifetime than females homozygous for the common allele conferring long legs. We also demonstrate that alleles conferring shorter legs have likely entered the population through a historic admixture event with the Dunface sheep. In summary, we show that different proxies for body size can have very different genetic architecture and that dense SNP helps in understanding both the mode of selection and the evolutionary history at loci underlying quantitative traits in natural populations.

Subject(s)

Body Size/genetics , Multifactorial Inheritance , Quantitative Trait, Heritable , Sheep/genetics , Alleles , Animals , Chromosomes/genetics , Female , Genetic Association Studies , Genotype , Haplotypes , Likelihood Functions , Linkage Disequilibrium , Phenotype , Polymorphism, Single Nucleotide

19.

GCTA: a tool for genome-wide complex trait analysis.

Yang, Jian; Lee, S Hong; Goddard, Michael E; Visscher, Peter M.

Am J Hum Genet ; 88(1): 76-82, 2011 Jan 07.

Article in English | MEDLINE | ID: mdl-21167468

ABSTRACT

For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

Subject(s)

Chromosomes, Human, X/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Software , Computer Simulation , Dosage Compensation, Genetic , Female , Genetic Linkage , Humans , Male , Models, Genetic

20.

GCTA-GREML accounts for linkage disequilibrium when estimating genetic variance from genome-wide SNPs.

Yang, Jian; Lee, S Hong; Wray, Naomi R; Goddard, Michael E; Visscher, Peter M.

Proc Natl Acad Sci U S A ; 113(32): E4579-80, 2016 08 09.

Article in English | MEDLINE | ID: mdl-27457963

Subject(s)

Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genetic Variation , Genome , Genome-Wide Association Study , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL