RESUMO
Fine-mapping and gene-prioritisation techniques applied to the latest Genome-Wide Association Study (GWAS) results have prioritised hundreds of genes as causally associated with disease. Here we leverage these recently compiled lists of high-confidence causal genes to interrogate where in the body disease genes operate. Specifically, we combine GWAS summary statistics, gene prioritisation results and gene expression RNA-seq data from 46 tissues and 204 cell types in relation to 16 major diseases (including 8 cancers). In tissues and cell types with well-established relevance to the disease, the prioritised genes typically have higher absolute and relative (i.e. tissue/cell specific) expression compared to non-prioritised 'control' genes. Examples include brain tissues in psychiatric disorders (P-value < 1×10-7), microglia cells in Alzheimer's Disease (P-value = 9.8×10-3) and colon mucosa in colorectal cancer (P-value < 1×10-3). We also observe significantly higher expression for disease genes in multiple tissues and cell types with no established links to the corresponding disease. While some of these results may be explained by cell types that span multiple tissues, such as macrophages in brain, blood, lung and spleen in relation to Alzheimer's disease (P-values < 1×10-3), the cause for others is unclear and motivates further investigation that may provide novel insights into disease etiology. For example, mammary tissue in Type 2 Diabetes (P-value < 1×10-7); reproductive tissues such as breast, uterus, vagina, and prostate in Coronary Artery Disease (P-value < 1×10-4); and motor neurons in psychiatric disorders (P-value < 3×10-4). In the GTEx dataset, tissue type is the major predictor of gene expression but the contribution of each predictor (tissue, sample, subject, batch) varies widely among disease-associated genes. Finally, we highlight genes with the highest levels of gene expression in relevant tissues to guide functional follow-up studies. Our results could offer novel insights into the tissues and cells involved in disease initiation, inform drug target and delivery strategies, highlighting potential off-target effects, and exemplify the relative performance of different statistical tests for linking disease genes with tissue and cell type gene expression.
RESUMO
Social isolation has been linked to a range of psychiatric issues, but the behavioral component that drives it is not well understood. Here, a genome-wide associations study (GWAS) was carried out to identify genetic variants that contribute specifically to social isolation behavior (SIB) in up to 449,609 participants from the UK Biobank. 17 loci were identified at genome-wide significance, contributing to a 4% SNP-based heritability estimate. Using the SIB GWAS, polygenic risk scores (PRS) were derived in ALSPAC, an independent, developmental cohort, and used to test for association with self-reported friendship scores, comprising items related to friendship quality and quantity, at age 12 and 18 to determine whether genetic predisposition manifests during childhood development. At age 18, friendship scores were associated with the SIB PRS, demonstrating that the genetic factors can predict related social traits in late adolescence. Linkage disequilibrium (LD) score correlation using the SIB GWAS demonstrated genetic correlations with autism spectrum disorder (ASD), schizophrenia, major depressive disorder (MDD), educational attainment, extraversion, and loneliness. However, no evidence of causality was found using a conservative Mendelian randomization approach between SIB and any of the traits in either direction. Genomic Structural Equation Modeling (SEM) revealed a common factor contributing to SIB, neuroticism, loneliness, MDD, and ASD, weakly correlated with a second common factor that contributes to psychiatric and psychotic traits. Our results show that SIB contributes a small heritable component, which is associated genetically with other social traits such as friendship as well as psychiatric disorders.
RESUMO
Human endogenous retroviruses (HERVs) are repetitive elements previously implicated in major psychiatric conditions, but their role in aetiology remains unclear. Here, we perform specialised transcriptome-wide association studies that consider HERV expression quantified to precise genomic locations, using RNA sequencing and genetic data from 792 post-mortem brain samples. In Europeans, we identify 1238 HERVs with expression regulated in cis, of which 26 represent expression signals associated with psychiatric disorders, with ten being conditionally independent from neighbouring expression signals. Of these, five are additionally significant in fine-mapping analyses and thus are considered high confidence risk HERVs. These include two HERV expression signatures specific to schizophrenia risk, one shared between schizophrenia and bipolar disorder, and one specific to major depressive disorder. No robust signatures are identified for autism spectrum conditions or attention deficit hyperactivity disorder in Europeans, or for any psychiatric trait in other ancestries, although this is likely a result of relatively limited statistical power. Ultimately, our study highlights extensive HERV expression and regulation in the adult cortex, including in association with psychiatric disorder risk, therefore providing a rationale for exploring neurological HERV expression in complex neuropsychiatric traits.
Assuntos
Transtorno Bipolar , Transtorno Depressivo Maior , Retrovirus Endógenos , Estudo de Associação Genômica Ampla , Esquizofrenia , Transcriptoma , Humanos , Retrovirus Endógenos/genética , Esquizofrenia/genética , Esquizofrenia/virologia , Transtorno Bipolar/genética , Fatores de Risco , Transtorno Depressivo Maior/genética , Transtorno Depressivo Maior/virologia , Transtornos Mentais/genética , Encéfalo/metabolismo , Encéfalo/virologia , Feminino , Masculino , Predisposição Genética para Doença , Transtorno do Deficit de Atenção com Hiperatividade/genética , AdultoRESUMO
Here we present BridgePRS, a novel Bayesian polygenic risk score (PRS) method that leverages shared genetic effects across ancestries to increase PRS portability. We evaluate BridgePRS via simulations and real UK Biobank data across 19 traits in individuals of African, South Asian and East Asian ancestry, using both UK Biobank and Biobank Japan genome-wide association study summary statistics; out-of-cohort validation is performed in the Mount Sinai (New York) BioMe biobank. BridgePRS is compared with the leading alternative, PRS-CSx, and two other PRS methods. Simulations suggest that the performance of BridgePRS relative to PRS-CSx increases as uncertainty increases: with lower trait heritability, higher polygenicity and greater between-population genetic diversity; and when causal variants are not present in the data. In real data, BridgePRS has a 61% larger average R2 than PRS-CSx in out-of-cohort prediction of African ancestry samples in BioMe (P = 6 × 10-5). BridgePRS is a computationally efficient, user-friendly and powerful approach for PRS analyses in non-European ancestries.
Assuntos
Predisposição Genética para Doença , Estratificação de Risco Genético , Humanos , Fatores de Risco , Estudo de Associação Genômica Ampla , Teorema de Bayes , Polimorfismo de Nucleotídeo Único/genética , Herança Multifatorial/genéticaRESUMO
Recent advances in genome-wide association and sequencing studies have shown that the genetic architecture of complex traits and diseases involves a combination of rare and common genetic variants distributed throughout the genome. One way to better understand this architecture is to visualize genetic associations across a wide range of allele frequencies. However, there is currently no standardized or consistent graphical representation for effectively illustrating these results. Here we propose a standardized approach for visualizing the effect size of risk variants across the allele frequency spectrum. The proposed plots have a distinctive trumpet shape: with the majority of variants having high frequency and small effects, and a small number of variants having lower frequency and larger effects. To demonstrate the utility of trumpet plots in illustrating the relationship between the number of variants, their frequency, and the magnitude of their effects in shaping the genetic architecture of complex traits and diseases, we generated trumpet plots for more than one hundred traits in the UK Biobank. To facilitate their broader use, we developed an R package, 'TrumpetPlots' (available at the Comprehensive R Archive Network) and R Shiny application, 'Shiny Trumpets' (available at https://juditgg.shinyapps.io/shinytrumpets/) that allows users to explore these results and submit their own data.
RESUMO
Polygenic risk scores (PRSs) have been among the leading advances in biomedicine in recent years. As a proxy of genetic liability, PRSs are utilised across multiple fields and applications. While numerous statistical and machine learning methods have been developed to optimise their predictive accuracy, these typically distil genetic liability to a single number based on aggregation of an individual's genome-wide risk alleles. This results in a key loss of information about an individual's genetic profile, which could be critical given the functional sub-structure of the genome and the heterogeneity of complex disease. In this manuscript, we introduce a 'pathway polygenic' paradigm of disease risk, in which multiple genetic liabilities underlie complex diseases, rather than a single genome-wide liability. We describe a method and accompanying software, PRSet, for computing and analysing pathway-based PRSs, in which polygenic scores are calculated across genomic pathways for each individual. We evaluate the potential of pathway PRSs in two distinct ways, creating two major sections: (1) In the first section, we benchmark PRSet as a pathway enrichment tool, evaluating its capacity to capture GWAS signal in pathways. We find that for target sample sizes of >10,000 individuals, pathway PRSs have similar power for evaluating pathway enrichment as leading methods MAGMA and LD score regression, with the distinct advantage of providing individual-level estimates of genetic liability for each pathway -opening up a range of pathway-based PRS applications, (2) In the second section, we evaluate the performance of pathway PRSs for disease stratification. We show that using a supervised disease stratification approach, pathway PRSs (computed by PRSet) outperform two standard genome-wide PRSs (computed by C+T and lassosum) for classifying disease subtypes in 20 of 21 scenarios tested. As the definition and functional annotation of pathways becomes increasingly refined, we expect pathway PRSs to offer key insights into the heterogeneity of complex disease and treatment response, to generate biologically tractable therapeutic targets from polygenic signal, and, ultimately, to provide a powerful path to precision medicine.
Assuntos
Genômica , Herança Multifatorial , Humanos , Fatores de Risco , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla , Software , Predisposição Genética para DoençaRESUMO
Polygenic risk scores (PRSs) aggregate the effects of genetic variants across the genome and are used to predict risk of complex diseases, such as obesity. Current PRSs only include common variants (minor allele frequency (MAF) ≥1%), whereas the contribution of rare variants in PRSs to predict disease remains unknown. Here, we examine whether augmenting the standard common variant PRS (PRScommon) with a rare variant PRS (PRSrare) improves prediction of obesity. We used genome-wide genotyped and imputed data on 451,145 European-ancestry participants of the UK Biobank, as well as whole exome sequencing (WES) data on 184,385 participants. We performed single variant analyses (for both common and rare variants) and gene-based analyses (for rare variants) for association with BMI (kg/m2), obesity (BMI ≥ 30 kg/m2), and extreme obesity (BMI ≥ 40 kg/m2). We built PRSscommon and PRSsrare using a range of methods (Clumping+Thresholding [C+T], PRS-CS, lassosum, gene-burden test). We selected the best-performing PRSs and assessed their performance in 36,757 European-ancestry unrelated participants with whole genome sequencing (WGS) data from the Trans-Omics for Precision Medicine (TOPMed) program. The best-performing PRScommon explained 10.1% of variation in BMI, and 18.3% and 22.5% of the susceptibility to obesity and extreme obesity, respectively, whereas the best-performing PRSrare explained 1.49%, and 2.97% and 3.68%, respectively. The PRSrare was associated with an increased risk of obesity and extreme obesity (ORobesity = 1.37 per SDPRS, Pobesity = 1.7x10-85; ORextremeobesity = 1.55 per SDPRS, Pextremeobesity = 3.8x10-40), which was attenuated, after adjusting for PRScommon (ORobesity = 1.08 per SDPRS, Pobesity = 9.8x10-6; ORextremeobesity= 1.09 per SDPRS, Pextremeobesity = 0.02). When PRSrare and PRScommon are combined, the increase in explained variance attributed to PRSrare was small (incremental Nagelkerke R2 = 0.24% for obesity and 0.51% for extreme obesity). Consistently, combining PRSrare to PRScommon provided little improvement to the prediction of obesity (PRSrare AUC = 0.591; PRScommon AUC = 0.708; PRScombined AUC = 0.710). In summary, while rare variants show convincing association with BMI, obesity and extreme obesity, the PRSrare provides limited improvement over PRScommon in the prediction of obesity risk, based on these large populations.
Assuntos
Estudo de Associação Genômica Ampla , Obesidade , Frequência do Gene , Variação Genética , Humanos , Obesidade/epidemiologia , Obesidade/genética , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Greater maternal adiposity before or during pregnancy is associated with greater offspring adiposity throughout childhood, but the extent to which this is due to causal intrauterine or periconceptional mechanisms remains unclear. Here, we use Mendelian randomisation (MR) with polygenic risk scores (PRS) to investigate whether associations between maternal pre-/early pregnancy body mass index (BMI) and offspring adiposity from birth to adolescence are causal. METHODS: We undertook confounder adjusted multivariable (MV) regression and MR using mother-offspring pairs from two UK cohorts: Avon Longitudinal Study of Parents and Children (ALSPAC) and Born in Bradford (BiB). In ALSPAC and BiB, the outcomes were birthweight (BW; N = 9339) and BMI at age 1 and 4 years (N = 8659 to 7575). In ALSPAC only we investigated BMI at 10 and 15 years (N = 4476 to 4112) and dual-energy X-ray absorptiometry (DXA) determined fat mass index (FMI) from age 10-18 years (N = 2659 to 3855). We compared MR results from several PRS, calculated from maternal non-transmitted alleles at between 29 and 80,939 single nucleotide polymorphisms (SNPs). RESULTS: MV and MR consistently showed a positive association between maternal BMI and BW, supporting a moderate causal effect. For adiposity at most older ages, although MV estimates indicated a strong positive association, MR estimates did not support a causal effect. For the PRS with few SNPs, MR estimates were statistically consistent with the null, but had wide confidence intervals so were often also statistically consistent with the MV estimates. In contrast, the largest PRS yielded MR estimates with narrower confidence intervals, providing strong evidence that the true causal effect on adolescent adiposity is smaller than the MV estimates (Pdifference = 0.001 for 15-year BMI). This suggests that the MV estimates are affected by residual confounding, therefore do not provide an accurate indication of the causal effect size. CONCLUSIONS: Our results suggest that higher maternal pre-/early-pregnancy BMI is not a key driver of higher adiposity in the next generation. Thus, they support interventions that target the whole population for reducing overweight and obesity, rather than a specific focus on women of reproductive age.
Assuntos
Adiposidade/genética , Obesidade/genética , Adolescente , Alelos , Índice de Massa Corporal , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Humanos , Lactente , Estudos Longitudinais , Obesidade/etiologia , Gravidez , Fatores de Risco , Reino UnidoRESUMO
The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.
Assuntos
Estudos de Associação Genética/métodos , Predisposição Genética para Doença , Genética Populacional/métodos , Herança Multifatorial , Algoritmos , Alelos , Bancos de Espécimes Biológicos , Variação Genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Modelos Genéticos , Fenótipo , Reprodutibilidade dos Testes , Reino UnidoRESUMO
BACKGROUND: Polygenic risk score (PRS) analyses are now routinely applied across biomedical research. However, as PRS studies grow in size, there is an increased risk of sample overlap between the genome-wide association study (GWAS) from which the PRS is derived and the "target sample," in which PRSs are computed and hypotheses are tested. Despite the wide recognition of the sample overlap problem, its potential impact on the results from PRS studies has not yet been quantified, and no analytical solution has been provided. FINDINGS: Here, we first conduct a comprehensive investigation into the scale of the sample overlap problem, finding that PRS results can be substantially inflated even in the presence of minimal overlap. Next, we introduce a method and software, EraSOR (Erase Sample Overlap and Relatedness), which eliminates the inflation caused by sample overlap (and close relatedness) in almost all settings tested here. CONCLUSIONS: EraSOR could be useful in PRS studies (with target sample >1,000) similar to those investigated here, either (i) to mitigate the potential effects of known or unknown intercohort overlap and close relatedness or (ii) as a sensitivity tool to highlight the possible presence of sample overlap before its direct removal, when possible, or else to provide a lower bound on PRS analysis results after accounting for potential sample overlap.
Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Software , Medição de Risco/métodos , Fatores de Risco , Predisposição Genética para DoençaRESUMO
BACKGROUND: Epidemiological studies report increased comorbidity between depression and autoimmune diseases. The role of shared genetic influences in the observed comorbidity is unclear. We investigated the evidence for pleiotropy between these traits in the UK Biobank (UKB). METHODS: We defined autoimmune and depression cases using hospital episode statistics, self-reported conditions and medications, and mental health questionnaires. Pairwise comparisons of depression prevalence between autoimmune cases and controls, and vice versa, were performed. Cross-trait polygenic risk score (PRS) analyses tested for pleiotropy, i.e., whether PRSs for depression could predict autoimmune disease status, and vice versa. RESULTS: We identified 28,479 cases of autoimmune diseases (pooling across 14 traits) and 324,074 autoimmune controls, and 65,075 cases of depression and 232,552 depression controls. The prevalence of depression was significantly higher in autoimmune cases than in controls, and similarly, the prevalence of autoimmune disease was higher in depression cases than in controls. PRSs for myasthenia gravis and psoriasis were significantly higher in depression cases than in controls (p < 5.2 × 10-5, R 2 ≤ 0.04%). PRSs for depression were significantly higher in inflammatory bowel disease, psoriasis, psoriatic arthritis, rheumatoid arthritis, and type 1 diabetes cases than in controls (p < 5.8 × 10-5, R 2 range = 0.06%-0.27%), and lower in celiac disease cases than in controls (p < 5.4 × 10-7, R 2 range = 0.11%-0.15%). CONCLUSIONS: Consistent with the literature, depression was more common in individuals with autoimmune diseases than in controls, and vice versa. PRSs showed some evidence for involvement of shared genetic factors, but the modest R 2 values suggest that shared genetic architecture accounts for a small proportion of the increased risk across traits.
RESUMO
Here we report how four major forms of Alzheimer's disease (AD) genetic risk-APOE-ε4, APOE-ε2, polygenic risk and familial risk-are associated with 273 traits in ~500,000 individuals in the UK Biobank. The traits cover blood biochemistry and cell traits, metabolic and general health, psychosocial health, and cognitive function. The difference in the profile of traits associated with the different forms of AD risk is striking and may contribute to heterogenous presentation of the disease. However, we also identify traits significantly associated with multiple forms of AD genetic risk, as well as traits showing significant changes across ages in those at high risk of AD, which may point to their potential roles in AD etiology. Finally, we highlight how survivor effects, in particular those relating to shared risks of cardiovascular disease and AD, can generate associations that may mislead interpretation in epidemiological AD studies. The UK Biobank provides a unique opportunity to powerfully compare the effects of different forms of AD genetic risk on the phenome in the same cohort.
Assuntos
Doença de Alzheimer , Doença de Alzheimer/genética , Apolipoproteínas E/genética , Genótipo , Humanos , Herança Multifatorial , Fenótipo , Fatores de RiscoRESUMO
Associations between exposures and outcomes reported in epidemiological studies are typically unadjusted for genetic confounding. We propose a two-stage approach for estimating the degree to which such observed associations can be explained by genetic confounding. First, we assess attenuation of exposure effects in regressions controlling for increasingly powerful polygenic scores. Second, we use structural equation models to estimate genetic confounding using heritability estimates derived from both SNP-based and twin-based studies. We examine associations between maternal education and three developmental outcomes - child educational achievement, Body Mass Index, and Attention Deficit Hyperactivity Disorder. Polygenic scores explain between 14.3% and 23.0% of the original associations, while analyses under SNP- and twin-based heritability scenarios indicate that observed associations could be almost entirely explained by genetic confounding. Thus, caution is needed when interpreting associations from non-genetically informed epidemiology studies. Our approach, akin to a genetically informed sensitivity analysis can be applied widely.
Assuntos
Fatores de Confusão Epidemiológicos , Adulto , Transtorno do Deficit de Atenção com Hiperatividade/genética , Índice de Massa Corporal , Criança , Desenvolvimento Infantil , Escolaridade , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Fatores de RiscoRESUMO
Traditional models of future alcohol use in adolescents have used variable-centered approaches, predicting alcohol use from a set of variables across entire samples or populations. Following the proposition that predictive factors may vary in adolescents as a function of family history, we used a two-pronged approach by first defining clusters of familial risk, followed by prediction analyses within each cluster. Thus, for the first time in adolescents, we tested whether adolescents with a family history of drug abuse exhibit a set of predictors different from adolescents without a family history. We apply this approach to a genetic risk score and individual differences in personality, cognition, behavior (risk-taking and discounting) substance use behavior at age 14, life events, and functional brain imaging, to predict scores on the alcohol use disorders identification test (AUDIT) at age 14 and 16 in a sample of adolescents (N = 1659 at baseline, N = 1327 at follow-up) from the IMAGEN cohort, a longitudinal community-based cohort of adolescents. In the absence of familial risk (n = 616), individual differences in baseline drinking, personality measures (extraversion, negative thinking), discounting behaviors, life events, and ventral striatal activation during reward anticipation were significantly associated with future AUDIT scores, while the overall model explained 22% of the variance in future AUDIT. In the presence of familial risk (n = 711), drinking behavior at age 14, personality measures (extraversion, impulsivity), behavioral risk-taking, and life events were significantly associated with future AUDIT scores, explaining 20.1% of the overall variance. Results suggest that individual differences in personality, cognition, life events, brain function, and drinking behavior contribute differentially to the prediction of future alcohol misuse. This approach may inform more individualized preventive interventions.
Assuntos
Alcoolismo , Adolescente , Consumo de Bebidas Alcoólicas/genética , Alcoolismo/genética , Predisposição Genética para Doença , Humanos , Comportamento Impulsivo , Recompensa , Fatores de RiscoRESUMO
BACKGROUND: Many studies have reported an increased risk of autism spectrum disorder (ASD) associated with some maternal diagnoses in pregnancy. However, such associations have not been studied systematically, accounting for comorbidity between maternal disorders. Therefore our aim was to comprehensively test the associations between maternal diagnoses around pregnancy and ASD risk in offspring. METHODS: This exploratory case-cohort study included children born in Israel from 1997 to 2008, and followed up until 2015. We used information on all ICD-9 codes received by their mothers during pregnancy and the preceding year. ASD risk associated with each of those conditions was calculated using Cox proportional hazards regression, adjusted for the confounders (birth year, maternal age, socioeconomic status and number of ICD-9 diagnoses during the exposure period). RESULTS: The analytic sample consisted of 80 187 individuals (1132 cases, 79 055 controls), with 822 unique ICD-9 codes recorded in their mothers. After extensive quality control, 22 maternal diagnoses were nominally significantly associated with offspring ASD, with 16 of those surviving subsequent filtering steps (permutation testing, multiple testing correction, multiple regression). Among those, we recorded an increased risk of ASD associated with metabolic [e.g. hypertension; HR = 2.74 (1.92-3.90), p = 2.43 × 10-8], genitourinary [e.g. non-inflammatory disorders of cervix; HR = 1.88 (1.38-2.57), p = 7.06 × 10-5] and psychiatric [depressive disorder; HR = 2.11 (1.32-3.35), p = 1.70 × 10-3] diagnoses. Meanwhile, mothers of children with ASD were less likely to attend prenatal care appointment [HR = 0.62 (0.54-0.71), p = 1.80 × 10-11]. CONCLUSIONS: Sixteen maternal diagnoses were associated with ASD in the offspring, after rigorous filtering of potential false-positive associations. Replication in other cohorts and further research to understand the mechanisms underlying the observed associations with ASD are warranted.
RESUMO
BACKGROUND: The UK Biobank contains data with varying degrees of reliability and completeness for assessing depression. A third of participants completed a Mental Health Questionnaire (MHQ) containing the gold-standard Composite International Diagnostic Interview (CIDI) criteria for assessing mental health disorders. AIMS: To investigate whether multiple observations of depression from sources other than the MHQ can enhance the validity of major depressive disorder (MDD). METHOD: In participants who did not complete the MHQ, we calculated the number of other depression measures endorsed, for example from hospital episode statistics and interview data. We compared cases defined this way with CIDI-defined cases for several estimates: the variance explained by polygenic risk scores (PRS), area under the curve attributable to PRS, single nucleotide polymorphisms (SNPs)-based heritability and genetic correlations with summary statistics from the Psychiatric Genomics Consortium MDD genome-wide association study. RESULTS: The strength of the genetic contribution increased with the number of measures endorsed. For example, SNP-based heritability increased from 7% in participants who endorsed only one measure of depression, to 21% in those who endorsed four or five measures of depression. The strength of the genetic contribution to cases defined by at least two measures approximated that for CIDI-defined cases. Most genetic correlations between UK Biobank and the Psychiatric Genomics Consortium MDD study exceeded 0.7, but there was variability between pairwise comparisons. CONCLUSIONS: Multiple measures of depression can serve as a reliable approximation for case status where the CIDI measure is not available, indicating sample size can be optimised using the entire suite of UK Biobank data.
RESUMO
To characterise the trait-effects of increased genetic risk for schizophrenia, and highlight potential risk mediators, we test the association between schizophrenia polygenic risk scores (PRSs) and 529 behavioural traits (personality, psychological, lifestyle, nutritional) in the UK Biobank. Our primary analysis is performed on individuals aged 38-71 with no history of schizophrenia or related disorders, allowing us to report the effects of schizophrenia genetic risk in the sub-clinical general population. Higher schizophrenia PRSs were associated with a range of traits, including lower verbal-numerical reasoning (P = 6 × 10-61), higher nervous feelings (P = 1 × 10-46) and higher self-reported risk-taking (P = 3 × 10-38). We follow-up the risk-taking association, hypothesising that the association may be due to a genetic propensity for risk-taking leading to greater migration, urbanicity or drug-taking - reported environmental risk factors for schizophrenia, and all positively associated with risk-taking in these data. Next, to identify potential disorder or medication effects, we compare the PRS-trait associations in the general population to the trait values in 599 medicated and non-medicated individuals diagnosed with schizophrenia in the biobank. This analysis highlights, for example, levels of BMI, physical activity and risk-taking in cases in the opposite directions than expected from the PRS-trait associations in the general population. Our analyses offer simple yet potentially revealing insights into the possible causes of observed trait-disorder associations, which can complement approaches such as Mendelian Randomisation. While we urge caution in causal interpretations in PRS cross-trait studies that are highly powered to detect weak horizontal pleiotropy or population structure, we propose that well-designed polygenic score analyses have the potential to highlight modifiable risk factors that lie on the path between genetic risk and disorder.