RESUMO
Understanding the genetic basis of routinely-acquired blood tests can provide insights into several aspects of human physiology. We report a genome-wide association study of 42 quantitative blood test traits defined using Electronic Healthcare Records (EHRs) of ~50,000 British Bangladeshi and British Pakistani adults. We demonstrate a causal variant within the PIEZO1 locus which was associated with alterations in red cell traits and glycated haemoglobin. Conditional analysis and within-ancestry fine mapping confirmed that this signal is driven by a missense variant - chr16-88716656-G-TT - which is common in South Asian ancestries (MAF 3.9%) but ultra-rare in other ancestries. Carriers of the T allele had lower mean HbA1c values, lower HbA1c values for a given level of random or fasting glucose, and delayed diagnosis of Type 2 Diabetes Mellitus. Our results shed light on the genetic basis of clinically-relevant traits in an under-represented population, and emphasise the importance of ancestral diversity in genetic studies.
Assuntos
Povo Asiático , Diabetes Mellitus Tipo 2 , Estudo de Associação Genômica Ampla , Hemoglobinas Glicadas , Canais Iônicos , Humanos , Bangladesh , Feminino , Paquistão , Masculino , Hemoglobinas Glicadas/metabolismo , Hemoglobinas Glicadas/genética , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/sangue , Povo Asiático/genética , Adulto , Canais Iônicos/genética , Reino Unido , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Testes Hematológicos , Estudos de Coortes , Alelos , Registros Eletrônicos de Saúde , Glicemia/metabolismo , Eritrócitos/metabolismoRESUMO
Polygenic scores (PGSs) offer the ability to predict genetic risk for complex diseases across the life course; a key benefit over short-term prediction models. To produce risk estimates relevant to clinical and public health decision-making, it is important to account for varying effects due to age and sex. Here, we develop a novel framework to estimate country-, age-, and sex-specific estimates of cumulative incidence stratified by PGS for 18 high-burden diseases. We integrate PGS associations from seven studies in four countries (N = 1,197,129) with disease incidences from the Global Burden of Disease. PGS has a significant sex-specific effect for asthma, hip osteoarthritis, gout, coronary heart disease and type 2 diabetes (T2D), with all but T2D exhibiting a larger effect in men. PGS has a larger effect in younger individuals for 13 diseases, with effects decreasing linearly with age. We show for breast cancer that, relative to individuals in the bottom 20% of polygenic risk, the top 5% attain an absolute risk for screening eligibility 16.3 years earlier. Our framework increases the generalizability of results from biobank studies and the accuracy of absolute risk estimates by appropriately accounting for age- and sex-specific PGS effects. Our results highlight the potential of PGS as a screening tool which may assist in the early prevention of common diseases.
Assuntos
Predisposição Genética para Doença , Herança Multifatorial , Humanos , Masculino , Feminino , Herança Multifatorial/genética , Incidência , Pessoa de Meia-Idade , Adulto , Idoso , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/epidemiologia , Fatores de Risco , Medição de Risco/métodos , Carga Global da Doença , Fatores Sexuais , Fatores EtáriosRESUMO
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (ß coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Fenótipo , Diabetes Mellitus Tipo 1/genética , Polimorfismo de Nucleotídeo Único , Aprendizado de MáquinaRESUMO
Autoimmune and inflammatory diseases are polygenic disorders of the immune system. Many genomic loci harbor risk alleles for several diseases, but the limited resolution of genetic mapping prevents determining whether the same allele is responsible, indicating a shared underlying mechanism. Here, using a collection of 129,058 cases and controls across 6 diseases, we show that ~40% of overlapping associations are due to the same allele. We improve fine-mapping resolution for shared alleles twofold by combining cases and controls across diseases, allowing us to identify more expression quantitative trait loci driven by the shared alleles. The patterns indicate widespread sharing of pathogenic mechanisms but not a single global autoimmune mechanism. Our approach can be applied to any set of traits and is particularly valuable as sample collections become depleted.
Assuntos
Alelos , Doenças Autoimunes , Mapeamento Cromossômico , Predisposição Genética para Doença , Locos de Características Quantitativas , Humanos , Doenças Autoimunes/genética , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla , Estudos de Casos e Controles , Herança Multifatorial/genéticaRESUMO
Genome-wide association studies (GWASs) may help inform treatments for infertility, whose causes remain unknown in many cases. Here we present GWAS meta-analyses across six cohorts for male and female infertility in up to 41,200 cases and 687,005 controls. We identified 21 genetic risk loci for infertility (P≤5E-08), of which 12 have not been reported for any reproductive condition. We found positive genetic correlations between endometriosis and all-cause female infertility (rg=0.585, P=8.98E-14), and between polycystic ovary syndrome and anovulatory infertility (rg=0.403, P=2.16E-03). The evolutionary persistence of female infertility-risk alleles in EBAG9 may be explained by recent directional selection. We additionally identified up to 269 genetic loci associated with follicle-stimulating hormone (FSH), luteinising hormone, oestradiol, and testosterone through sex-specific GWAS meta-analyses (N=6,095-246,862). While hormone-associated variants near FSHB and ARL14EP colocalised with signals for anovulatory infertility, we found no rg between female infertility and reproductive hormones (P>0.05). Exome sequencing analyses in the UK Biobank (N=197,340) revealed that women carrying testosterone-lowering rare variants in GPC2 were at higher risk of infertility (OR=2.63, P=1.25E-03). Taken together, our results suggest that while individual genes associated with hormone regulation may be relevant for fertility, there is limited genetic evidence for correlation between reproductive hormones and infertility at the population level. We provide the first comprehensive view of the genetic architecture of infertility across multiple diagnostic criteria in men and women, and characterise its relationship to other health conditions.
RESUMO
Polygenic risk scores (PRSs) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. We propose PRSmix, a framework that leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture for 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% confidence interval [CI], [1.10; 1.3]; p = 9.17 × 10-5) and 1.19-fold (95% CI, [1.11; 1.27]; p = 1.92 × 10-6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI, [1.40; 2.04]; p = 7.58 × 10-6) and 1.42-fold (95% CI, [1.25; 1.59]; p = 8.01 × 10-7) in European and South Asian ancestries, respectively. Compared to the previously cross-trait-combination methods with scores from pre-defined correlated traits, we demonstrated that our method improved prediction accuracy for coronary artery disease up to 3.27-fold (95% CI, [2.1; 4.44]; p value after false discovery rate (FDR) correction = 2.6 × 10-4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
Assuntos
Doença da Artéria Coronariana , Osteopatia , Humanos , Herança Multifatorial/genética , Estratificação de Risco Genético , Benchmarking , Doença da Artéria Coronariana/diagnósticoRESUMO
Most genome-wide association studies (GWAS) of major depression (MD) have been conducted in samples of European ancestry. Here we report a multi-ancestry GWAS of MD, adding data from 21 cohorts with 88,316 MD cases and 902,757 controls to previously reported data. This analysis used a range of measures to define MD and included samples of African (36% of effective sample size), East Asian (26%) and South Asian (6%) ancestry and Hispanic/Latin American participants (32%). The multi-ancestry GWAS identified 53 significantly associated novel loci. For loci from GWAS in European ancestry samples, fewer than expected were transferable to other ancestry groups. Fine mapping benefited from additional sample diversity. A transcriptome-wide association study identified 205 significantly associated novel genes. These findings suggest that, for MD, increasing ancestral and global diversity in genetic studies may be particularly important to ensure discovery of core genes and inform about transferability of findings.
Assuntos
Transtorno Depressivo Maior , Estudo de Associação Genômica Ampla , Humanos , Predisposição Genética para Doença , Transtorno Depressivo Maior/genética , Depressão , Mapeamento Cromossômico , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Thyroid hormones play a critical role in regulation of multiple physiological functions and thyroid dysfunction is associated with substantial morbidity. Here, we use electronic health records to undertake a genome-wide association study of thyroid-stimulating hormone (TSH) levels, with a total sample size of 247,107. We identify 158 novel genetic associations, more than doubling the number of known associations with TSH, and implicate 112 putative causal genes, of which 76 are not previously implicated. A polygenic score for TSH is associated with TSH levels in African, South Asian, East Asian, Middle Eastern and admixed American ancestries, and associated with hypothyroidism and other thyroid disease in South Asians. In Europeans, the TSH polygenic score is associated with thyroid disease, including thyroid cancer and age-of-onset of hypothyroidism and hyperthyroidism. We develop pathway-specific genetic risk scores for TSH levels and use these in phenome-wide association studies to identify potential consequences of pathway perturbation. Together, these findings demonstrate the potential utility of genetic associations to inform future therapeutics and risk prediction for thyroid diseases.
Assuntos
Hipertireoidismo , Hipotireoidismo , Doenças da Glândula Tireoide , Humanos , Tireotropina/genética , Estudo de Associação Genômica Ampla , Doenças da Glândula Tireoide/genética , Hipotireoidismo/genética , Hipertireoidismo/genética , TiroxinaRESUMO
Background: Cytochrome P450 family 2 subfamily C member 19 (CYP2C19) is a hepatic enzyme involved in the metabolism of clopidogrel from a prodrug to its active metabolite. Prior studies of genetic polymorphisms in CYP2C19 and their relationship with clinical efficacy have not included South Asian populations. Objectives: The objective of this study was to assess prevalence of common CYP2C19 genotype polymorphisms in a British-South Asian population and correlate these with recurrent myocardial infarction risk in participants prescribed clopidogrel. Methods: The Genes & Health cohort of British Bangladeshi and Pakistani ancestry participants were studied. CYP2C19 diplotypes were assessed using array data. Multivariable logistic regression was used to test for association between genetically inferred CYP2C19 metabolizer status and recurrent myocardial infarction, controlling for known cardiovascular disease risk factors, percutaneous coronary intervention, age, sex, and population stratification. Results: Genes & Health cohort participants (N = 44,396) have a high prevalence (57%) of intermediate or poor CYP2C19 metabolizers, with at least 1 loss-of-function CYP2C19 allele. The prevalence of poor metabolizers carrying 2 CYP2C19 loss-of-function alleles is 13%, which is higher than that in previously studied European (2.4%) and Central/South Asian populations (8.2%). Sixty-nine percent of the cohort who were diagnosed with an acute myocardial infarction were prescribed clopidogrel. Poor metabolizers were significantly more likely to have a recurrent myocardial infarction (OR: 3.1; P = 0.019). Conclusions: A pharmacogenomic-driven approach to clopidogrel prescribing has the potential to impact significantly on clinical management and outcomes in individuals of Bangladeshi and Pakistani ancestry.
RESUMO
Autozygosity is associated with rare Mendelian disorders and clinically relevant quantitative traits. We investigated associations between the fraction of the genome in runs of homozygosity (FROH) and common diseases in Genes & Health (n = 23,978 British South Asians), UK Biobank (n = 397,184), and 23andMe. We show that restricting analysis to offspring of first cousins is an effective way of reducing confounding due to social/environmental correlates of FROH. Within this group in G&H+UK Biobank, we found experiment-wide significant associations between FROH and twelve common diseases. We replicated associations with type 2 diabetes (T2D) and post-traumatic stress disorder via within-sibling analysis in 23andMe (median n = 480,282). We estimated that autozygosity due to consanguinity accounts for 5%-18% of T2D cases among British Pakistanis. Our work highlights the possibility of widespread non-additive genetic effects on common diseases and has important implications for global populations with high rates of consanguinity.
Assuntos
Consanguinidade , Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/genética , Homozigoto , Fenótipo , Polimorfismo de Nucleotídeo Único , Bancos de Espécimes Biológicos , Genoma Humano , Predisposição Genética para Doença , Reino UnidoRESUMO
Background: Blood platelets are mediators of atherothrombotic disease and are regulated by complex sets of genes. Association studies in European ancestry populations have already detected informative platelet regulatory loci. Studies in other ancestries can potentially reveal new associations because of different allele frequencies, linkage structures, and variant effects. Objectives: To reveal new regulatory genes for platelet count (PLT). Methods: Genome-wide association studies (GWAS) were performed in 20,218 Bangladeshi and 9198 Pakistani individuals from the Genes & Health study. Loci significantly associated with PLT underwent fine-mapping to identify candidate genes. Results: Of 1588 significantly associated variants (P < 5 × 10-8) at 20 loci in the Bangladeshi analysis, most replicated findings in prior transancestry GWAS and in the Pakistani analysis. However, the Bangladeshi locus defined by rs946528 (chr1:46019890) did not associate with PLT in the Pakistani analysis but was in the same linkage disequilibrium block (r2 ≥ 0.5) as PLT-associated variants in prior East Asian GWAS. The single independent association signal was refined to a 95% credible set of 343 variants spanning 8 coding genes. Functional annotation, mapping to megakaryocyte regulatory regions, and colocalization with blood expression quantitative trait loci identified the likely mediator of the PLT phenotype to be PIK3R3 encoding a regulator of phosphoinositol 3-kinase (PI3K). Conclusion: Abnormal PI3K activity in the vessel wall is already implicated in the pathogenesis of atherothrombosis. Our identification of a new association between PIK3R3 and PLT provides further mechanistic insights into the contribution of the PI3K pathway to platelet biology.
RESUMO
Identification of individuals at highest risk of coronary artery disease (CAD)-ideally before onset-remains an important public health need. Prior studies have developed genome-wide polygenic scores to enable risk stratification, reflecting the substantial inherited component to CAD risk. Here we develop a new and significantly improved polygenic score for CAD, termed GPSMult, that incorporates genome-wide association data across five ancestries for CAD (>269,000 cases and >1,178,000 controls) and ten CAD risk factors. GPSMult strongly associated with prevalent CAD (odds ratio per standard deviation 2.14, 95% confidence interval 2.10-2.19, P < 0.001) in UK Biobank participants of European ancestry, identifying 20.0% of the population with 3-fold increased risk and conversely 13.9% with 3-fold decreased risk as compared with those in the middle quintile. GPSMult was also associated with incident CAD events (hazard ratio per standard deviation 1.73, 95% confidence interval 1.70-1.76, P < 0.001), identifying 3% of healthy individuals with risk of future CAD events equivalent to those with existing disease and significantly improving risk discrimination and reclassification. Across multiethnic, external validation datasets inclusive of 33,096, 124,467, 16,433 and 16,874 participants of African, European, Hispanic and South Asian ancestry, respectively, GPSMult demonstrated increased strength of associations across all ancestries and outperformed all available previously published CAD polygenic scores. These data contribute a new GPSMult for CAD to the field and provide a generalizable framework for how large-scale integration of genetic association data for CAD and related traits from diverse populations can meaningfully improve polygenic risk prediction.
Assuntos
Doença da Artéria Coronariana , Humanos , Doença da Artéria Coronariana/epidemiologia , Doença da Artéria Coronariana/genética , Estudo de Associação Genômica Ampla , Predisposição Genética para Doença/genética , Fatores de Risco , FenótipoRESUMO
AIMS: CYP2C19 is a hepatic enzyme involved in the metabolism of antidepressants associated with increased gastrointestinal bleed (GIB) risk. The aim of our study was to explore a possible association between loss-of-function CYP2C19 genotypes and GIB in South Asian ancestry participants prescribed antidepressants. METHODS: Genes & Health participants with a record in Barts Health NHS Trust (N 22 753) were studied using a cross-sectional approach. CYP2C19 diplotypes were assessed and metabolizer type inferred from consortia guidance. Fisher's exact test was used to compare the prevalence of GIB in different metabolizer categories. Multivariable regression was used to test for association between antidepressant prescriptions and GIB, and between CYP2C19 metabolizer state and GIB in the subcohort prescribed antidepressants. RESULTS: Antidepressants were frequently prescribed (47%, N = 10 612). A total of 864 participants (4%) had a GIB; 534 (62%) had been prescribed a CYP2C19 metabolized antidepressant. There was an independent association between antidepressant prescriptions and GIB events (odds ratio 1.8, confidence interval 1.5-2.0, P < 0.0001). There was no relationship between CYP2C19 inferred poor (P 0.56) or intermediate (P 0.53) metabolizer status and GIB in those prescribed an antidepressant in unadjusted analysis. A multivariable logistic regression model did not show an independent association between poor (P 0.54) or intermediate (P 0.62) CYP2C19 metabolizers and GIB in the subcohort prescribed antidepressants. CONCLUSIONS: CYP2C19 dependent antidepressants are associated with increased GIB prevalence. GIB appeared independent from CYP2C19 metabolizer genotype in individuals who had been prescribed antidepressants. Precision dosing based on CYP2C19 genetic information alone is unlikely to reduce GIB prevalence.
Assuntos
Antidepressivos , Citocromo P-450 CYP2C19 , Hemorragia Gastrointestinal , Humanos , Alelos , Antidepressivos/efeitos adversos , Antidepressivos/metabolismo , Hidrocarboneto de Aril Hidroxilases/genética , Hidrocarboneto de Aril Hidroxilases/metabolismo , Citocromo P-450 CYP2C19/genética , Genótipo , Prevalência , Mutação com Perda de Função , Hemorragia Gastrointestinal/induzido quimicamente , Hemorragia Gastrointestinal/etnologia , Hemorragia Gastrointestinal/genética , População do Sul da Ásia/genética , Ásia Meridional/etnologia , Reino UnidoRESUMO
Preeclampsia and gestational hypertension are common pregnancy complications associated with adverse maternal and child outcomes. Current tools for prediction, prevention and treatment are limited. Here we tested the association of maternal DNA sequence variants with preeclampsia in 20,064 cases and 703,117 control individuals and with gestational hypertension in 11,027 cases and 412,788 control individuals across discovery and follow-up cohorts using multi-ancestry meta-analysis. Altogether, we identified 18 independent loci associated with preeclampsia/eclampsia and/or gestational hypertension, 12 of which are new (for example, MTHFR-CLCN6, WNT3A, NPR3, PGR and RGL3), including two loci (PLCE1 and FURIN) identified in the multitrait analysis. Identified loci highlight the role of natriuretic peptide signaling, angiogenesis, renal glomerular function, trophoblast development and immune dysregulation. We derived genome-wide polygenic risk scores that predicted preeclampsia/eclampsia and gestational hypertension in external cohorts, independent of clinical risk factors, and reclassified eligibility for low-dose aspirin to prevent preeclampsia. Collectively, these findings provide mechanistic insights into the hypertensive disorders of pregnancy and have the potential to advance pregnancy risk stratification.
Assuntos
Eclampsia , Hipertensão Induzida pela Gravidez , Hipertensão , Pré-Eclâmpsia , Gravidez , Feminino , Criança , Humanos , Hipertensão Induzida pela Gravidez/genética , Pré-Eclâmpsia/genética , Pré-Eclâmpsia/prevenção & controle , Aspirina , Fatores de RiscoRESUMO
BACKGROUND: Reported association between statin use and cataract risk is controversial. The SLCO1B1 gene encodes a transport protein responsible for statin clearance. The aim of this study was to investigate a possible association between the SLCO1B1*5 reduced function variant and cataract risk in statin users of South Asian ethnicity. METHODS: The Genes & Health cohort consists of British-Bangladeshi and British-Pakistani participants from East London, Manchester and Bradford, UK. SLCO1B1*5 genotype was assessed with the Illumina GSAMD-24v3-0-EA chip. Medication data from primary care health record linkage was used to compare those who had regularly used statins compared to those who had not. Multivariable logistic regression was used to test for association between statin use and cataracts, adjusting for population characteristics and potential confounders in 36,513 participants. Multivariable logistic regression was used to test association between SLCO1B1*5 heterozygotes or homozygotes and cataracts, in subgroups having been regularly prescribed statins versus not. RESULTS: Statins were prescribed to 35% (12,704) of participants (average age 41 years old, 45% male). Non-senile cataract was diagnosed in 5% (1686) of participants. An apparent association between statins and non-senile cataract (12% in statin users and 0.8% in non-statin users) was negated by inclusion of confounders. In those prescribed a statin, presence of the SLCO1B1*5 genotype was independently associated with a decreased risk of non-senile cataract (OR 0.7 (CI 0.5-0.9, p 0.007)). CONCLUSIONS: Our findings suggest that there is no independent association between statin use and non-senile cataract risk after adjusting for confounders. Among statin users, the SLCO1B1*5 genotype is associated with a 30% risk reduction of non-senile cataracts. Stratification of on-drug cohorts by validated pharmacogenomic variants is a useful tool to support or repudiate adverse drug events in observational cohorts.
Assuntos
Catarata , Inibidores de Hidroximetilglutaril-CoA Redutases , Humanos , Masculino , Adulto , Feminino , Inibidores de Hidroximetilglutaril-CoA Redutases/efeitos adversos , Genótipo , Catarata/induzido quimicamente , Catarata/epidemiologia , Catarata/genética , Transportador 1 de Ânion Orgânico Específico do Fígado/genéticaRESUMO
This study assessed the contribution of five genes previously known to be involved in cholestatic liver disease in British Bangladeshi and Pakistani people. Five genes (ABCB4, ABCB11, ATP8B1, NR1H4, TJP2) were interrogated by exome sequencing data of 5236 volunteers. Included were non-synonymous or loss of function (LoF) variants with a minor allele frequency < 5%. Variants were filtered, and annotated to perform rare variant burden analysis, protein structure, and modelling analysis in-silico. Out of 314 non-synonymous variants, 180 fulfilled the inclusion criteria and were mostly heterozygous unless specified. 90 were novel and of those variants, 22 were considered likely pathogenic and 9 pathogenic. We identified variants in volunteers with gallstone disease (n = 31), intrahepatic cholestasis of pregnancy (ICP, n = 16), cholangiocarcinoma and cirrhosis (n = 2). Fourteen novel LoF variants were identified: 7 frameshift, 5 introduction of premature stop codon and 2 splice acceptor variants. The rare variant burden was significantly increased in ABCB11. Protein modelling demonstrated variants that appeared to likely cause significant structural alterations. This study highlights the significant genetic burden contributing to cholestatic liver disease. Novel likely pathogenic and pathogenic variants were identified addressing the underrepresentation of diverse ancestry groups in genomic research.
Assuntos
Colelitíase , Colestase Intra-Hepática , Colestase , Feminino , Gravidez , Humanos , Mutação , Colestase/genética , Colestase Intra-Hepática/genética , Colestase Intra-Hepática/metabolismo , Reino Unido/epidemiologiaRESUMO
Polygenic risk scores aggregate an individual's burden of risk alleles to estimate the overall genetic risk for a specific trait or disease. Polygenic risk scores derived from genome-wide association studies of European populations perform poorly for other ancestral groups. Given the potential for future clinical utility, underperformance of polygenic risk scores in South Asian populations has the potential to reinforce health inequalities. To determine whether European-derived polygenic risk scores underperform at multiple sclerosis prediction in a South Asian-ancestry population compared with a European-ancestry cohort, we used data from two longitudinal genetic cohort studies: Genes & Health (2015-present), a study of â¼50 000 British-Bangladeshi and British-Pakistani individuals, and UK Biobank (2006-present), which is comprised of â¼500 000 predominantly White British individuals. We compared individuals with and without multiple sclerosis in both studies (Genes & Health: N Cases = 42, N Control = 40 490; UK Biobank: N Cases = 2091, N Control = 374 866). Polygenic risk scores were calculated using clumping and thresholding with risk allele effect sizes obtained from the largest multiple sclerosis genome-wide association study to date. Scores were calculated with and without the major histocompatibility complex region, the most influential locus in determining multiple sclerosis risk. Polygenic risk score prediction was evaluated using Nagelkerke's pseudo-R 2 metric adjusted for case ascertainment, age, sex and the first four genetic principal components. We found that, as expected, European-derived polygenic risk scores perform poorly in the Genes & Health cohort, explaining 1.1% (including the major histocompatibility complex) and 1.5% (excluding the major histocompatibility complex) of disease risk. In contrast, multiple sclerosis polygenic risk scores explained 4.8% (including the major histocompatibility complex) and 2.8% (excluding the major histocompatibility complex) of disease risk in European-ancestry UK Biobank participants. These findings suggest that polygenic risk score prediction of multiple sclerosis based on European genome-wide association study results is less accurate in a South Asian population. Genetic studies of ancestrally diverse populations are required to ensure that polygenic risk scores can be useful across ancestries.
RESUMO
Polygenic risk scores (PRS) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. Validation and transferability of existing PRS across independent datasets and diverse ancestries are limited, which hinders the practical utility and exacerbates health disparities. We propose PRSmix, a framework that evaluates and leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture. We applied PRSmix to 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% CI: [1.10; 1.3]; P-value = 9.17 × 10-5) and 1.19-fold (95% CI: [1.11; 1.27]; P-value = 1.92 × 10-6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI: [1.40; 2.04]; P-value = 7.58 × 10-6) and 1.42-fold (95% CI: [1.25; 1.59]; P-value = 8.01 × 10-7) in European and South Asian ancestries, respectively. Compared to the previously established cross-trait-combination method with scores from pre-defined correlated traits, we demonstrated that our method can improve prediction accuracy for coronary artery disease up to 3.27-fold (95% CI: [2.1; 4.44]; P-value after FDR correction = 2.6 × 10-4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
RESUMO
Individuals with South Asian ancestry have a higher risk of heart disease than other groups but have been largely excluded from genetic research. Using data from 22,000 British Pakistani and Bangladeshi individuals with linked electronic health records from the Genes & Health cohort, we conducted genome-wide association studies of coronary artery disease and its key risk factors. Using power-adjusted transferability ratios, we found evidence for transferability for the majority of cardiometabolic loci powered to replicate. The performance of polygenic scores was high for lipids and blood pressure, but lower for BMI and coronary artery disease. Adding a polygenic score for coronary artery disease to clinical risk factors showed significant improvement in reclassification. In Mendelian randomisation using transferable loci as instruments, our findings were consistent with results in European-ancestry individuals. Taken together, trait-specific transferability of trait loci between populations is an important consideration with implications for risk prediction and causal inference.
Assuntos
Doença da Artéria Coronariana , Estudo de Associação Genômica Ampla , Povo Asiático/genética , Doença da Artéria Coronariana/epidemiologia , Doença da Artéria Coronariana/genética , Loci Gênicos , Humanos , Paquistão , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Type 2 diabetes (T2D) is highly prevalent in British South Asians, yet they are underrepresented in research. Genes & Health (G&H) is a large, population study of British Pakistanis and Bangladeshis (BPB) comprising genomic and routine health data. We assessed the extent to which genetic risk for T2D is shared between BPB and European populations (EUR). We then investigated whether the integration of a polygenic risk score (PRS) for T2D with an existing risk tool (QDiabetes) could improve prediction of incident disease and the characterisation of disease subtypes. METHODS AND FINDINGS: In this observational cohort study, we assessed whether common genetic loci associated with T2D in EUR individuals were replicated in 22,490 BPB individuals in G&H. We replicated fewer loci in G&H (n = 76/338, 22%) than would be expected given power if all EUR-ascertained loci were transferable (n = 101, 30%; p = 0.001). Of the 27 transferable loci that were powered to interrogate this, only 9 showed evidence of shared causal variants. We constructed a T2D PRS and combined it with a clinical risk instrument (QDiabetes) in a novel, integrated risk tool (IRT) to assess risk of incident diabetes. To assess model performance, we compared categorical net reclassification index (NRI) versus QDiabetes alone. In 13,648 patients free from T2D followed up for 10 years, NRI was 3.2% for IRT versus QDiabetes (95% confidence interval (CI): 2.0% to 4.4%). IRT performed best in reclassification of individuals aged less than 40 years deemed low risk by QDiabetes alone (NRI 5.6%, 95% CI 3.6% to 7.6%), who tended to be free from comorbidities and slim. After adjustment for QDiabetes score, PRS was independently associated with progression to T2D after gestational diabetes (hazard ratio (HR) per SD of PRS 1.23, 95% CI 1.05 to 1.42, p = 0.028). Using cluster analysis of clinical features at diabetes diagnosis, we replicated previously reported disease subgroups, including Mild Age-Related, Mild Obesity-related, and Insulin-Resistant Diabetes, and showed that PRS distribution differs between subgroups (p = 0.002). Integrating PRS in this cluster analysis revealed a Probable Severe Insulin Deficient Diabetes (pSIDD) subgroup, despite the absence of clinical measures of insulin secretion or resistance. We also observed differences in rates of progression to micro- and macrovascular complications between subgroups after adjustment for confounders. Study limitations include the absence of an external replication cohort and the potential biases arising from missing or incorrect routine health data. CONCLUSIONS: Our analysis of the transferability of T2D loci between EUR and BPB indicates the need for larger, multiancestry studies to better characterise the genetic contribution to disease and its varied aetiology. We show that a T2D PRS optimised for this high-risk BPB population has potential clinical application in BPB, improving the identification of T2D risk (especially in the young) on top of an established clinical risk algorithm and aiding identification of subgroups at diagnosis, which may help future efforts to stratify care and treatment of the disease.