RESUMO
BACKGROUND: Hospital-based biobanks are being increasingly considered as a resource for translating polygenic risk scores (PRS) into clinical practice. However, since these biobanks originate from patient populations, there is a possibility of bias in polygenic risk estimation due to overrepresentation of patients with higher frequency of healthcare interactions. METHODS: PRS for schizophrenia, bipolar disorder, and depression were calculated using summary statistics from the largest available genomic studies for a sample of 24 153 European ancestry participants in the Mass General Brigham (MGB) Biobank. To correct for selection bias, we fitted logistic regression models with inverse probability (IP) weights, which were estimated using 1839 sociodemographic, clinical, and healthcare utilization features extracted from electronic health records of 1 546 440 non-Hispanic White patients eligible to participate in the Biobank study at their first visit to the MGB-affiliated hospitals. RESULTS: Case prevalence of bipolar disorder among participants in the top decile of bipolar disorder PRS was 10.0% (95% CI 8.8-11.2%) in the unweighted analysis but only 6.2% (5.0-7.5%) when selection bias was accounted for using IP weights. Similarly, case prevalence of depression among those in the top decile of depression PRS was reduced from 33.5% (31.7-35.4%) to 28.9% (25.8-31.9%) after IP weighting. CONCLUSIONS: Non-random selection of participants into volunteer biobanks may induce clinically relevant selection bias that could impact implementation of PRS in research and clinical settings. As efforts to integrate PRS in medical practice expand, recognition and mitigation of these biases should be considered and may need to be optimized in a context-specific manner.
Assuntos
Transtorno Bipolar , Humanos , Predisposição Genética para Doença , Viés de Seleção , Estudo de Associação Genômica Ampla , Transtorno Bipolar/epidemiologia , Transtorno Bipolar/genética , Herança Multifatorial , Fatores de RiscoRESUMO
AIMS: To investigate whether metabolic signature composed of multiple plasma metabolites can be used to characterize adherence and metabolic response to the Mediterranean diet and whether such a metabolic signature is associated with cardiovascular disease (CVD) risk. METHODS AND RESULTS: Our primary study cohort included 1859 participants from the Spanish PREDIMED trial, and validation cohorts included 6868 participants from the US Nurses' Health Studies I and II, and Health Professionals Follow-up Study (NHS/HPFS). Adherence to the Mediterranean diet was assessed using a validated Mediterranean Diet Adherence Screener (MEDAS), and plasma metabolome was profiled by liquid chromatography-tandem mass spectrometry. We observed substantial metabolomic variation with respect to Mediterranean diet adherence, with nearly one-third of the assayed metabolites significantly associated with MEDAS (false discovery rate < 0.05). Using elastic net regularized regressions, we identified a metabolic signature, comprised of 67 metabolites, robustly correlated with Mediterranean diet adherence in both PREDIMED and NHS/HPFS (r = 0.28-0.37 between the signature and MEDAS; P = 3 × 10-35 to 4 × 10-118). In multivariable Cox regressions, the metabolic signature showed a significant inverse association with CVD incidence after adjusting for known risk factors (PREDIMED: hazard ratio [HR] per standard deviation increment in the signature = 0.71, P < 0.001; NHS/HPFS: HR = 0.85, P = 0.001), and the association persisted after further adjustment for MEDAS scores (PREDIMED: HR = 0.73, P = 0.004; NHS/HPFS: HR = 0.85, P = 0.004). Further genome-wide association analysis revealed that the metabolic signature was significantly associated with genetic loci involved in fatty acids and amino acids metabolism. Mendelian randomization analyses showed that the genetically inferred metabolic signature was significantly associated with risk of coronary heart disease (CHD) and stroke (odds ratios per SD increment in the genetically inferred metabolic signature = 0.92 for CHD and 0.91 for stroke; P < 0.001). CONCLUSIONS: We identified a metabolic signature that robustly reflects adherence and metabolic response to a Mediterranean diet, and predicts future CVD risk independent of traditional risk factors, in Spanish and US cohorts.
Assuntos
Doenças Cardiovasculares , Dieta Mediterrânea , Doenças Cardiovasculares/epidemiologia , Seguimentos , Estudo de Associação Genômica Ampla , Humanos , Metaboloma , Fatores de RiscoRESUMO
BACKGROUND: Adult height has been associated with risk of several site-specific cancers, including melanoma. However, less attention has been given to non-melanoma skin cancer (NMSC). METHODS: We prospectively examined the risk of squamous cell carcinoma (SCC) and basal cell carcinoma (BCC) in relation to adult height in the Nurses' Health Study (NHS, n=117 863) and the Health Professionals Follow-up Study (HPFS, n=51 111). We also investigated the relationships between height-related genetic markers and risk of BCC and SCC in the genetic data sets of the NHS and HPFS (3898 BCC cases, and 8530 BCC controls; 527 SCC cases, and 8962 SCC controls). RESULTS: After controlling for potential confounding factors, the hazard ratios were 1.09 (95% CI: 1.02, 1.15) and 1.10 (95% CI: 1.07, 1.13) for the associations between every 10 cm increase in height and risk of SCC and BCC respectively. None of the 687 height-related single-nucleotide polymorphisms (SNPs) was significantly associated with the risk of SCC or BCC, nor were the genetic scores combining independent height-related loci. CONCLUSIONS: Our data from two large cohorts provide further evidence that height is associated with an increased risk of NMSC. More studies on height-related genetic loci and early-life exposures may help clarify the underlying mechanisms.
Assuntos
Estatura/genética , Carcinoma Basocelular/genética , Carcinoma de Células Escamosas/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Neoplasias Cutâneas/genética , Adulto , Idoso , Carcinoma Basocelular/epidemiologia , Carcinoma de Células Escamosas/epidemiologia , Estudos de Casos e Controles , Feminino , Seguimentos , Ocupações em Saúde , Humanos , Masculino , Pessoa de Meia-Idade , Enfermeiras e Enfermeiros , Neoplasias Cutâneas/epidemiologiaRESUMO
Growing evidence from both epidemiology and basic science suggest an inverse association between Alzheimer's disease (AD) and cancer. We examined the genetic relationship between AD and various cancer types using GWAS summary statistics from the IGAP and GAME-ON consortia. Sample size ranged from 9931 to 54,162; SNPs were imputed to the 1000 Genomes European panel. Our results based on cross-trait LD Score regression showed a significant positive genetic correlation between AD and five cancers combined (colon, breast, prostate, ovarian, lung; r g = 0.17, P = 0.04), and specifically with breast cancer (ER-negative and overall; r g = 0.21 and 0.18, P = 0.035 and 0.034) and lung cancer (adenocarcinoma, squamous cell carcinoma and overall; r g = 0.31, 0.38 and 0.30, P = 0.029, 0.016, and 0.006). Estimating the genetic correlation in specific functional categories revealed mixed positive and negative signals, notably stronger at annotations associated with increased enhancer activity. This suggests a role of gene expression regulators in the shared genetic etiology between AD and cancer, and that some shared variants modulate disease risk concordantly while others have effects in opposite directions. Due to power issues, we did not detect cross-phenotype associations at individual SNPs. This genetic overlap is not likely driven by a handful of major loci. Our study is the first to examine the co-heritability of AD and cancer leveraging large-scale GWAS results. The functional categories highlighted in this study need further investigation to illustrate the details of the genetic sharing and to bridge between different levels of associations.
Assuntos
Doença de Alzheimer/genética , Estudo de Associação Genômica Ampla , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Doença de Alzheimer/epidemiologia , Feminino , Humanos , Masculino , Neoplasias/epidemiologiaRESUMO
BACKGROUND: Despite growing interest in the clinical translation of polygenic risk scores (PRSs), it remains uncertain to what extent genomic information can enhance the prediction of psychiatric outcomes beyond the data collected during clinical visits alone. OBJECTIVE: This study aimed to assess the clinical utility of incorporating PRSs into a suicide risk prediction model trained on electronic health records (EHRs) and patient-reported surveys among patients admitted to the emergency department. METHODS: Study participants were recruited from the psychiatric emergency department at Massachusetts General Hospital. There were 333 adult patients of European ancestry who had high-quality genotype data available through their participation in the Mass General Brigham Biobank. Multiple neuropsychiatric PRSs were added to a previously validated suicide prediction model in a prospective cohort enrolled between February 4, 2015, and March 13, 2017. Data analysis was performed from July 11, 2022, to August 31, 2023. Suicide attempt was defined using diagnostic codes from longitudinal EHRs combined with 6-month follow-up surveys. The clinical risk score for suicide attempt was calculated from an ensemble model trained using an EHR-based suicide risk score and a brief survey, and it was subsequently used to define the baseline model. We generated PRSs for depression, bipolar disorder, schizophrenia, suicide attempt, and externalizing traits using a Bayesian polygenic scoring method for European ancestry participants. Model performance was evaluated using area under the receiver operator curve (AUC), area under the precision-recall curve, and positive predictive values. RESULTS: Of the 333 patients (n=178, 53.5% male; mean age 36.8, SD 13.6 years; n=333, 100% non-Hispanic and n=324, 97.3% self-reported White), 28 (8.4%) had a suicide attempt within 6 months. Adding either the schizophrenia PRS or all PRSs to the baseline model resulted in the numerically highest discrimination (AUC 0.86, 95% CI 0.73-0.99) compared to the baseline model (AUC 0.84, 95% Cl 0.70-0.98). However, the improvement in model performance was not statistically significant. CONCLUSIONS: In this study, incorporating genomic information into clinical prediction models for suicide attempt did not improve patient risk stratification. Larger studies that include more diverse participants are required to validate whether the inclusion of psychiatric PRSs in clinical prediction models can enhance the stratification of patients at risk of suicide attempts.
RESUMO
OBJECTIVE: Treatment-resistant depression (TRD) occurs in roughly one-third of all individuals with major depressive disorder (MDD). Although research has suggested a significant common variant genetic component of liability to TRD, with heritability estimated at 8% when compared with non-treatment-resistant MDD, no replicated genetic loci have been identified, and the genetic architecture of TRD remains unclear. A key barrier to this work has been the paucity of adequately powered cohorts for investigation, largely because of the challenge in prospectively investigating this phenotype. The objective of this study was to perform a well-powered genetic study of TRD. METHODS: Using receipt of electroconvulsive therapy (ECT) as a surrogate for TRD, the authors applied standard machine learning methods to electronic health record data to derive predicted probabilities of receiving ECT. These probabilities were then applied as a quantitative trait in a genome-wide association study of 154,433 genotyped patients across four large biobanks. RESULTS: Heritability estimates ranged from 2% to 4.2%, and significant genetic overlap was observed with cognition, attention deficit hyperactivity disorder, schizophrenia, alcohol and smoking traits, and body mass index. Two genome-wide significant loci were identified, both previously implicated in metabolic traits, suggesting shared biology and potential pharmacological implications. CONCLUSIONS: This work provides support for the utility of estimation of disease probability for genomic investigation and provides insights into the genetic architecture and biology of TRD.
Assuntos
Transtorno Depressivo Maior , Transtorno Depressivo Resistente a Tratamento , Eletroconvulsoterapia , Estudo de Associação Genômica Ampla , Humanos , Transtorno Depressivo Resistente a Tratamento/genética , Transtorno Depressivo Resistente a Tratamento/terapia , Feminino , Masculino , Transtorno Depressivo Maior/genética , Transtorno Depressivo Maior/terapia , Pessoa de Meia-Idade , Aprendizado de Máquina , Adulto , Fenótipo , Idoso , Índice de Massa Corporal , Esquizofrenia/genética , Esquizofrenia/terapiaRESUMO
Educational attainment (EduYears), a heritable trait often used as a proxy for cognitive ability, is associated with various health and social outcomes. Previous genome-wide association studies (GWASs) on EduYears have been focused on samples of European (EUR) genetic ancestries. Here we present the first large-scale GWAS of EduYears in people of East Asian (EAS) ancestry (n = 176,400) and conduct a cross-ancestry meta-analysis with EduYears GWAS in people of EUR ancestry (n = 766,345). EduYears showed a high genetic correlation and power-adjusted transferability ratio between EAS and EUR. We also found similar functional enrichment, gene expression enrichment and cross-trait genetic correlations between two populations. Cross-ancestry fine-mapping identified refined credible sets with a higher posterior inclusion probability than single population fine-mapping. Polygenic prediction analysis in four independent EAS and EUR cohorts demonstrated transferability between populations. Our study supports the need for further research on diverse ancestries to increase our understanding of the genetic basis of educational attainment.
Assuntos
Sucesso Acadêmico , População do Leste Asiático , Humanos , Escolaridade , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , População BrancaRESUMO
Genome-wide association studies (GWASs) have been mostly conducted in populations of European ancestry, which currently limits the transferability of their findings to other populations. Here, we show, through theory, simulations and applications to real data, that adjustment of GWAS analyses for polygenic scores (PGSs) increases the statistical power for discovery across all ancestries. We applied this method to analyze seven traits available in three large biobanks with participants of East Asian ancestry (n = 340,000 in total) and report 139 additional associations across traits. We also present a two-stage meta-analysis strategy whereby, in contributing cohorts, a PGS-adjusted GWAS is rerun using PGSs derived from a first round of a standard meta-analysis. On average, across traits, this approach yields a 1.26-fold increase in the number of detected associations (range 1.07- to 1.76-fold increase). Altogether, our study demonstrates the value of using PGSs to increase the power of GWASs in underrepresented populations and promotes such an analytical strategy for future GWAS meta-analyses.
Assuntos
População do Leste Asiático , Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , População do Leste Asiático/genéticaRESUMO
Genome-wide association studies (GWASs) have identified tens of thousands of genetic loci associated with human complex traits. However, the majority of GWASs were conducted in individuals of European ancestries. Failure to capture global genetic diversity has limited genomic discovery and has impeded equitable delivery of genomic knowledge to diverse populations. Here we report findings from 102,900 individuals across 36 human quantitative traits in the Taiwan Biobank (TWB), a major biobank effort that broadens the population diversity of genetic studies in East Asia. We identified 968 novel genetic loci, pinpointed novel causal variants through statistical fine-mapping, compared the genetic architecture across TWB, Biobank Japan, and UK Biobank, and evaluated the utility of cross-phenotype, cross-population polygenic risk scores in disease risk prediction. These results demonstrated the potential to advance discovery through diversifying GWAS populations and provided insights into the common genetic basis of human complex traits in East Asia.
RESUMO
Polygenic risk scores (PRS) have attenuated cross-population predictive performance. As existing genome-wide association studies (GWAS) have been conducted predominantly in individuals of European descent, the limited transferability of PRS reduces their clinical value in non-European populations, and may exacerbate healthcare disparities. Recent efforts to level ancestry imbalance in genomic research have expanded the scale of non-European GWAS, although most remain underpowered. Here, we present a new PRS construction method, PRS-CSx, which improves cross-population polygenic prediction by integrating GWAS summary statistics from multiple populations. PRS-CSx couples genetic effects across populations via a shared continuous shrinkage (CS) prior, enabling more accurate effect size estimation by sharing information between summary statistics and leveraging linkage disequilibrium diversity across discovery samples, while inheriting computational efficiency and robustness from PRS-CS. We show that PRS-CSx outperforms alternative methods across traits with a wide range of genetic architectures, cross-population genetic overlaps and discovery GWAS sample sizes in simulations, and improves the prediction of quantitative traits and schizophrenia risk in non-European populations.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Humanos , Desequilíbrio de Ligação , Herança Multifatorial/genética , Fatores de RiscoRESUMO
The Taiwan Biobank (TWB) is an ongoing prospective study of >150,000 individuals aged 20-70 in Taiwan. A comprehensive list of phenotypes was collected for each consented participant at recruitment and follow-up visits through structured interviews and physical measurements. Biomarkers and genetic data were generated from blood and urine samples. We present here an overview of TWB's genetic data quality, population structure, and familial relationship, which consists of predominantly Han Chinese ancestry, and highlight its important attributes and genetic findings thus far. A linkage to Taiwan's National Health Insurance database of >25 years and other registries is underway to enrich the phenotypic spectrum and enable deep and longitudinal genetic investigations. TWB provides one of the largest biobank resources for biomedical and public health research in East Asia that will contribute to our understanding of the genetic basis of health and disease in global populations through collaborative studies with other biobanks.
RESUMO
BACKGROUND: The developmental and epileptic encephalopathies (DEEs) are the most severe group of epilepsies which co-present with developmental delay and intellectual disability (ID). DEEs usually occur in people without a family history of epilepsy and have emerged as primarily monogenic, with damaging rare mutations found in 50% of patients. Little is known about the genetic architecture of patients with DEEs in whom no pathogenic variant is identified. Polygenic risk scoring (PRS) is a method that measures a person's common genetic burden for a trait or condition. Here, we used PRS to test whether genetic burden for epilepsy is relevant in individuals with DEEs, and other forms of epilepsy with ID. METHODS: Genetic data on 2,759 cases with DEEs, or epilepsy with ID presumed to have a monogenic basis, and 447,760 population-matched controls were analysed. We compared PRS for 'all epilepsy', 'focal epilepsy', and 'genetic generalised epilepsy' (GGE) between cases and controls. We performed pairwise comparisons between cases stratified for identifiable rare deleterious genetic variants and controls. FINDINGS: Cases of presumed monogenic severe epilepsy had an increased PRS for 'all epilepsy' (p<0.0001), 'focal epilepsy' (p<0.0001), and 'GGE' (p=0.0002) relative to controls, which explain between 0.08% and 3.3% of phenotypic variance. PRS was increased in cases both with and without an identified deleterious variant of major effect, and there was no significant difference in PRS between the two groups. INTERPRETATION: We provide evidence that common genetic variation contributes to the aetiology of DEEs and other forms of epilepsy with ID, even when there is a known pathogenic variant of major effect. These results provide insight into the genetic underpinnings of the severe epilepsies and warrant a shift in our understanding of the aetiology of the DEEs as complex, rather than monogenic, disorders. FUNDING: Science foundation Ireland, Human Genome Research Institute; National Heart, Lung, and Blood Institute; German Research Foundation.
Assuntos
Epilepsia Generalizada , Deficiência Intelectual , Epilepsia Generalizada/diagnóstico , Epilepsia Generalizada/genética , Variação Genética , Humanos , Herança Multifatorial , Mutação , FenótipoRESUMO
BACKGROUND: Type 2 diabetes (T2D) is a worldwide scourge caused by both genetic and environmental risk factors that disproportionately afflicts communities of color. Leveraging existing large-scale genome-wide association studies (GWAS), polygenic risk scores (PRS) have shown promise to complement established clinical risk factors and intervention paradigms, and improve early diagnosis and prevention of T2D. However, to date, T2D PRS have been most widely developed and validated in individuals of European descent. Comprehensive assessment of T2D PRS in non-European populations is critical for equitable deployment of PRS to clinical practice that benefits global populations. METHODS: We integrated T2D GWAS in European, African, and East Asian populations to construct a trans-ancestry T2D PRS using a newly developed Bayesian polygenic modeling method, and assessed the prediction accuracy of the PRS in the multi-ethnic Electronic Medical Records and Genomics (eMERGE) study (11,945 cases; 57,694 controls), four Black cohorts (5137 cases; 9657 controls), and the Taiwan Biobank (4570 cases; 84,996 controls). We additionally evaluated a post hoc ancestry adjustment method that can express the polygenic risk on the same scale across ancestrally diverse individuals and facilitate the clinical implementation of the PRS in prospective cohorts. RESULTS: The trans-ancestry PRS was significantly associated with T2D status across the ancestral groups examined. The top 2% of the PRS distribution can identify individuals with an approximately 2.5-4.5-fold of increase in T2D risk, which corresponds to the increased risk of T2D for first-degree relatives. The post hoc ancestry adjustment method eliminated major distributional differences in the PRS across ancestries without compromising its predictive performance. CONCLUSIONS: By integrating T2D GWAS from multiple populations, we developed and validated a trans-ancestry PRS, and demonstrated its potential as a meaningful index of risk among diverse patients in clinical settings. Our efforts represent the first step towards the implementation of the T2D PRS into routine healthcare.
Assuntos
Diabetes Mellitus Tipo 2 , Estudo de Associação Genômica Ampla , Teorema de Bayes , Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença , Humanos , Estudos Prospectivos , Fatores de RiscoRESUMO
BACKGROUND: Clinical laboratory (lab) tests are used in clinical practice to diagnose, treat, and monitor disease conditions. Test results are stored in electronic health records (EHRs), and a growing number of EHRs are linked to patient DNA, offering unprecedented opportunities to query relationships between genetic risk for complex disease and quantitative physiological measurements collected on large populations. METHODS: A total of 3075 quantitative lab tests were extracted from Vanderbilt University Medical Center's (VUMC) EHR system and cleaned for population-level analysis according to our QualityLab protocol. Lab values extracted from BioVU were compared with previous population studies using heritability and genetic correlation analyses. We then tested the hypothesis that polygenic risk scores for biomarkers and complex disease are associated with biomarkers of disease extracted from the EHR. In a proof of concept analyses, we focused on lipids and coronary artery disease (CAD). We cleaned lab traits extracted from the EHR performed lab-wide association scans (LabWAS) of the lipids and CAD polygenic risk scores across 315 heritable lab tests then replicated the pipeline and analyses in the Massachusetts General Brigham Biobank. RESULTS: Heritability estimates of lipid values (after cleaning with QualityLab) were comparable to previous reports and polygenic scores for lipids were strongly associated with their referent lipid in a LabWAS. LabWAS of the polygenic score for CAD recapitulated canonical heart disease biomarker profiles including decreased HDL, increased pre-medication LDL, triglycerides, blood glucose, and glycated hemoglobin (HgbA1C) in European and African descent populations. Notably, many of these associations remained even after adjusting for the presence of cardiovascular disease and were replicated in the MGBB. CONCLUSIONS: Polygenic risk scores can be used to identify biomarkers of complex disease in large-scale EHR-based genomic analyses, providing new avenues for discovery of novel biomarkers and deeper understanding of disease trajectories in pre-symptomatic individuals. We present two methods and associated software, QualityLab and LabWAS, to clean and analyze EHR labs at scale and perform a Lab-Wide Association Scan.
Assuntos
Biomarcadores/metabolismo , Técnicas de Laboratório Clínico , Doença/genética , Herança Multifatorial/genética , Bancos de Espécimes Biológicos , Doença da Artéria Coronariana/sangue , Doença da Artéria Coronariana/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Lipídeos/sangue , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos TestesRESUMO
Polygenic risk scores (PRS) have shown promise in predicting human complex traits and diseases. Here, we present PRS-CS, a polygenic prediction method that infers posterior effect sizes of single nucleotide polymorphisms (SNPs) using genome-wide association summary statistics and an external linkage disequilibrium (LD) reference panel. PRS-CS utilizes a high-dimensional Bayesian regression framework, and is distinct from previous work by placing a continuous shrinkage (CS) prior on SNP effect sizes, which is robust to varying genetic architectures, provides substantial computational advantages, and enables multivariate modeling of local LD patterns. Simulation studies using data from the UK Biobank show that PRS-CS outperforms existing methods across a wide range of genetic architectures, especially when the training sample size is large. We apply PRS-CS to predict six common complex diseases and six quantitative traits in the Partners HealthCare Biobank, and further demonstrate the improvement of PRS-CS in prediction accuracy over alternative methods.