Search | VHL Regional Portal

1.

Population Performance and Individual Agreement of Coronary Artery Disease Polygenic Risk Scores.

Abramowitz, Sarah A; Boulier, Kristin; Keat, Karl; Cardone, Katie M; Shivakumar, Manu; DePaolo, John; Judy, Renae; Kim, Dokyoon; Rader, Daniel J; Voight, Benjamin F; Pasaniuc, Bogdan; Levin, Michael G; Damrauer, Scott M.

medRxiv ; 2024 Jul 29.

Article in English | MEDLINE | ID: mdl-39108513

ABSTRACT

Importance: Polygenic risk scores (PRSs) for coronary artery disease (CAD) are a growing clinical and commercial reality. Whether existing scores provide similar individual-level assessments of disease liability is a critical consideration for clinical implementation that remains uncharacterized. Objective: Characterize the reliability of CAD PRSs that perform equivalently at the population level at predicting individual-level risk. Design: Cross-sectional Study. Setting: All of Us Research Program (AOU), Penn Medicine Biobank (PMBB), and UCLA ATLAS Precision Health Biobank. Participants: Volunteers of diverse genetic backgrounds enrolled in AOU, PMBB, and UCLA with available electronic health record and genotyping data. Exposures: Polygenic risk for CAD from previously published PRSs and new PRSs developed separately from the testing cohorts. Main Outcomes and Measures: Sets of CAD PRSs that perform population prediction equivalently were identified by comparing calibration and discrimination (Brier score and AUROC) of generalized linear models of prevalent CAD using Bayesian analysis of variance. Among equivalently performing scores, individual-level agreement between risk estimates was tested with intraclass correlation (ICC) and Light's Kappa, measures of inter-rater reliability. Results: 50 PRSs were calculated for 171,095 AOU participants. When included in a model of prevalent CAD, 48 scores had practically equivalent Brier scores and AUROCs (region of practical equivalence = 0.02). Across these scores, 84% of participants had at least one score in both the top and bottom risk quintile. Continuous agreement of individual risk predictions from the 48 scores was poor, with an ICC of 0.351 (95% CI; 0.349, 0.352). Agreement between two statistically equivalent scores was moderate, with an ICC of 0.649 (95% CI; 0.646, 0.652). Light's Kappa, used to evaluate consistency of assignment to high-risk thresholds, did not exceed 0.56 (interpreted as 'fair') across statistically and practically equivalent scores. Repeating the analysis among 41,193 PMBB and 50,748 UCLA participants yielded different sets of statistically and practically equivalent scores which also lacked strong individual agreement. Conclusions and Relevance: Across three diverse biobanks, CAD PRSs that performed equivalently at the population level produced unreliable individual risk estimates. Approaches to clinical implementation of CAD PRSs must consider the potential for discordant individual risk estimates from otherwise indistinguishable scores.

2.

All of Us diversity and scale improve polygenic prediction contextually with greatest improvements for under-represented populations.

Tsuo, Kristin; Shi, Zhuozheng; Ge, Tian; Mandla, Ravi; Hou, Kangcheng; Ding, Yi; Pasaniuc, Bogdan; Wang, Ying; Martin, Alicia R.

bioRxiv ; 2024 Aug 06.

Article in English | MEDLINE | ID: mdl-39149254

ABSTRACT

Recent studies have demonstrated that polygenic risk scores (PRS) trained on multi-ancestry data can improve prediction accuracy in groups historically underrepresented in genomic studies, but the availability of linked health and genetic data from large-scale diverse cohorts representative of a wide spectrum of human diversity remains limited. To address this need, the All of Us research program (AoU) generated whole-genome sequences of 245,388 individuals who collectively reflect the diversity of the USA. Leveraging this resource and another widely-used population-scale biobank, the UK Biobank (UKB) with a half million participants, we developed PRS trained on multi-ancestry and multi-biobank data with up to â¼750,000 participants for 32 common, complex traits and diseases across a range of genetic architectures. We then compared effects of ancestry, PRS methodology, and genetic architecture on PRS accuracy across a held out subset of ancestrally diverse AoU participants. Due to the more heterogeneous study design of AoU, we found lower heritability on average compared to UKB (0.075 vs 0.165), which limited the maximal achievable PRS accuracy in AoU. Overall, we found that the increased diversity of AoU significantly improved PRS performance in some participants in AoU, especially underrepresented individuals, across multiple phenotypes. Notably, maximizing sample size by combining discovery data across AoU and UKB is not the optimal approach for predicting some phenotypes in African ancestry populations; rather, using data from only AoU for these traits resulted in the greatest accuracy. This was especially true for less polygenic traits with large ancestry-enriched effects, such as neutrophil count ( R 2 : 0.055 vs. 0.035 using AoU vs. cross-biobank meta-analysis, respectively, because of e.g. DARC ). Lastly, we calculated individual-level PRS accuracies rather than grouping by continental ancestry, a critical step towards interpretability in precision medicine. Individualized PRS accuracy decays linearly as a function of ancestry divergence, but the slope was smaller using multi-ancestry GWAS compared to using European GWAS. Our results highlight the potential of biobanks with more balanced representations of human diversity to facilitate more accurate PRS for the individuals least represented in genomic studies.

3.

Calibrated prediction intervals for polygenic scores across diverse contexts.

Hou, Kangcheng; Xu, Ziqi; Ding, Yi; Mandla, Ravi; Shi, Zhuozheng; Boulier, Kristin; Harpak, Arbel; Pasaniuc, Bogdan.

Nat Genet ; 56(7): 1386-1396, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38886587

ABSTRACT

Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations.

Subject(s)

Genome-Wide Association Study , Models, Genetic , Multifactorial Inheritance , Humans , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Female , Male , Calibration , Biological Specimen Banks , Phenotype , Genomics/methods , Polymorphism, Single Nucleotide

4.

Multi-ancestry polygenic risk scores for venous thromboembolism.

Jee, Yon Ho; Thibord, Florian; Dominguez, Alicia; Sept, Corriene; Boulier, Kristin; Venkateswaran, Vidhya; Ding, Yi; Cherlin, Tess; Verma, Shefali Setia; Faro, Valeria Lo; Bartz, Traci M; Boland, Anne; Brody, Jennifer A; Deleuze, Jean-Francois; Emmerich, Joseph; Germain, Marine; Johnson, Andrew D; Kooperberg, Charles; Morange, Pierre-Emmanuel; Pankratz, Nathan; Psaty, Bruce M; Reiner, Alexander P; Smadja, David M; Sitlani, Colleen M; Suchon, Pierre; Tang, Weihong; Trégouët, David-Alexandre; Zöllner, Sebastian; Pasaniuc, Bogdan; Damrauer, Scott M; Sanna, Serena; Snieder, Harold; Kabrhel, Christopher; Smith, Nicholas L; Kraft, Peter.

Hum Mol Genet ; 2024 Jun 16.

Article in English | MEDLINE | ID: mdl-38879759

ABSTRACT

Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium genome-wide association studies meta-analyses of European- (71 771 cases and 1 059 740 controls) and African-ancestry samples (7482 cases and 129 975 controls). We used LDpred2 and PRS-CSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6781 cases and 103 016 controls) and African-ancestry sample (1385 cases and 12 569 controls). Multi-ancestry PRSs with weights tuned in European-ancestry samples slightly outperformed ancestry-specific PRSs in European-ancestry test samples (e.g. the area under the receiver operating curve [AUC] was 0.609 for PRS-CSx_combinedEUR and 0.608 for PRS-CSxEUR [P = 0.00029]). Multi-ancestry PRSs with weights tuned in African-ancestry samples also outperformed ancestry-specific PRSs in African-ancestry test samples (PRS-CSxAFR: AUC = 0.58, PRS-CSx_combined AFR: AUC = 0.59), although this difference was not statistically significant (P = 0.34). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS might be used to improve performance across diverse populations to identify individuals at highest risk for VTE.

5.

Splicing-specific transcriptome-wide association uncovers genetic mechanisms for schizophrenia.

Hervoso, Jonatan L; Amoah, Kofi; Dodson, Jack; Choudhury, Mudra; Bhattacharya, Arjun; Quinones-Valdez, Giovanni; Pasaniuc, Bogdan; Xiao, Xinshu.

Am J Hum Genet ; 111(8): 1573-1587, 2024 Aug 08.

Article in English | MEDLINE | ID: mdl-38925119

ABSTRACT

Recent studies have highlighted the essential role of RNA splicing, a key mechanism of alternative RNA processing, in establishing connections between genetic variations and disease. Genetic loci influencing RNA splicing variations show considerable influence on complex traits, possibly surpassing those affecting total gene expression. Dysregulated RNA splicing has emerged as a major potential contributor to neurological and psychiatric disorders, likely due to the exceptionally high prevalence of alternatively spliced genes in the human brain. Nevertheless, establishing direct associations between genetically altered splicing and complex traits has remained an enduring challenge. We introduce Spliced-Transcriptome-Wide Associations (SpliTWAS) to integrate alternative splicing information with genome-wide association studies to pinpoint genes linked to traits through exon splicing events. We applied SpliTWAS to two schizophrenia (SCZ) RNA-sequencing datasets, BrainGVEX and CommonMind, revealing 137 and 88 trait-associated exons (in 84 and 67 genes), respectively. Enriched biological functions in the associated gene sets converged on neuronal function and development, immune cell activation, and cellular transport, which are highly relevant to SCZ. SpliTWAS variants impacted RNA-binding protein binding sites, revealing potential disruption of RNA-protein interactions affecting splicing. We extended the probabilistic fine-mapping method FOCUS to the exon level, identifying 36 genes and 48 exons as putatively causal for SCZ. We highlight VPS45 and APOPT1, where splicing of specific exons was associated with disease risk, eluding detection by conventional gene expression analysis. Collectively, this study supports the substantial role of alternative splicing in shaping the genetic basis of SCZ, providing a valuable approach for future investigations in this area.

Subject(s)

Alternative Splicing , Exons , Genome-Wide Association Study , Schizophrenia , Transcriptome , Humans , Schizophrenia/genetics , Alternative Splicing/genetics , Exons/genetics , Genetic Predisposition to Disease , RNA Splicing/genetics , Polymorphism, Single Nucleotide

6.

Electronic health record signatures identify undiagnosed patients with common variable immunodeficiency disease.

Johnson, Ruth; Stephens, Alexis V; Mester, Rachel; Knyazev, Sergey; Kohn, Lisa A; Freund, Malika K; Bondhus, Leroy; Hill, Brian L; Schwarz, Tommer; Zaitlen, Noah; Arboleda, Valerie A; A Bastarache, Lisa; Pasaniuc, Bogdan; Butte, Manish J.

Sci Transl Med ; 16(745): eade4510, 2024 May.

Article in English | MEDLINE | ID: mdl-38691621

ABSTRACT

Human inborn errors of immunity include rare disorders entailing functional and quantitative antibody deficiencies due to impaired B cells called the common variable immunodeficiency (CVID) phenotype. Patients with CVID face delayed diagnoses and treatments for 5 to 15 years after symptom onset because the disorders are rare (prevalence of ~1/25,000), and there is extensive heterogeneity in CVID phenotypes, ranging from infections to autoimmunity to inflammatory conditions, overlapping with other more common disorders. The prolonged diagnostic odyssey drives excessive system-wide costs before diagnosis. Because there is no single causal mechanism, there are no genetic tests to definitively diagnose CVID. Here, we present PheNet, a machine learning algorithm that identifies patients with CVID from their electronic health records (EHRs). PheNet learns phenotypic patterns from verified CVID cases and uses this knowledge to rank patients by likelihood of having CVID. PheNet could have diagnosed more than half of our patients with CVID 1 or more years earlier than they had been diagnosed. When applied to a large EHR dataset, followed by blinded chart review of the top 100 patients ranked by PheNet, we found that 74% were highly probable to have CVID. We externally validated PheNet using >6 million records from disparate medical systems in California and Tennessee. As artificial intelligence and machine learning make their way into health care, we show that algorithms such as PheNet can offer clinical benefits by expediting the diagnosis of rare diseases.

Subject(s)

Common Variable Immunodeficiency , Electronic Health Records , Humans , Common Variable Immunodeficiency/diagnosis , Machine Learning , Algorithms , Male , Female , Phenotype , Adult , Undiagnosed Diseases/diagnosis

7.

Brain cell-type shifts in Alzheimer's disease, autism, and schizophrenia interrogated using methylomics and genetics.

Yap, Chloe X; Vo, Daniel D; Heffel, Matthew G; Bhattacharya, Arjun; Wen, Cindy; Yang, Yuanhao; Kemper, Kathryn E; Zeng, Jian; Zheng, Zhili; Zhu, Zhihong; Hannon, Eilis; Vellame, Dorothea Seiler; Franklin, Alice; Caggiano, Christa; Wamsley, Brie; Geschwind, Daniel H; Zaitlen, Noah; Gusev, Alexander; Pasaniuc, Bogdan; Mill, Jonathan; Luo, Chongyuan; Gandal, Michael J.

Sci Adv ; 10(21): eadn7655, 2024 May 24.

Article in English | MEDLINE | ID: mdl-38781333

ABSTRACT

Few neuropsychiatric disorders have replicable biomarkers, prompting high-resolution and large-scale molecular studies. However, we still lack consensus on a more foundational question: whether quantitative shifts in cell types-the functional unit of life-contribute to neuropsychiatric disorders. Leveraging advances in human brain single-cell methylomics, we deconvolve seven major cell types using bulk DNA methylation profiling across 1270 postmortem brains, including from individuals diagnosed with Alzheimer's disease, schizophrenia, and autism. We observe and replicate cell-type compositional shifts for Alzheimer's disease (endothelial cell loss), autism (increased microglia), and schizophrenia (decreased oligodendrocytes), and find age- and sex-related changes. Multiple layers of evidence indicate that endothelial cell loss contributes to Alzheimer's disease, with comparable effect size to APOE genotype among older people. Genome-wide association identified five genetic loci related to cell-type composition, involving plausible genes for the neurovascular unit (P2RX5 and TRPV3) and excitatory neurons (DPY30 and MEMO1). These results implicate specific cell-type shifts in the pathophysiology of neuropsychiatric disorders.

Subject(s)

Alzheimer Disease , Autistic Disorder , Brain , DNA Methylation , Schizophrenia , Humans , Alzheimer Disease/genetics , Alzheimer Disease/pathology , Alzheimer Disease/metabolism , Schizophrenia/genetics , Schizophrenia/pathology , Brain/metabolism , Brain/pathology , Autistic Disorder/genetics , Autistic Disorder/pathology , Male , Female , Genome-Wide Association Study , Aged , Endothelial Cells/metabolism , Endothelial Cells/pathology , Epigenomics/methods , Middle Aged , Aged, 80 and over

8.

Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain.

Wen, Cindy; Margolis, Michael; Dai, Rujia; Zhang, Pan; Przytycki, Pawel F; Vo, Daniel D; Bhattacharya, Arjun; Matoba, Nana; Tang, Miao; Jiao, Chuan; Kim, Minsoo; Tsai, Ellen; Hoh, Celine; Aygün, Nil; Walker, Rebecca L; Chatzinakos, Christos; Clarke, Declan; Pratt, Henry; Peters, Mette A; Gerstein, Mark; Daskalakis, Nikolaos P; Weng, Zhiping; Jaffe, Andrew E; Kleinman, Joel E; Hyde, Thomas M; Weinberger, Daniel R; Bray, Nicholas J; Sestan, Nenad; Geschwind, Daniel H; Roeder, Kathryn; Gusev, Alexander; Pasaniuc, Bogdan; Stein, Jason L; Love, Michael I; Pollard, Katherine S; Liu, Chunyu; Gandal, Michael J.

Science ; 384(6698): eadh0829, 2024 May 24.

Article in English | MEDLINE | ID: mdl-38781368

ABSTRACT

Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.

Subject(s)

Alternative Splicing , Brain , Gene Expression Regulation, Developmental , Mental Disorders , Humans , Atlases as Topic , Autism Spectrum Disorder/genetics , Brain/metabolism , Brain/growth & development , Brain/embryology , Gene Regulatory Networks , Genome-Wide Association Study , Protein Isoforms/genetics , Protein Isoforms/metabolism , Quantitative Trait Loci , Schizophrenia/genetics , Transcriptome , Mental Disorders/genetics

9.

Developmental isoform diversity in the human neocortex informs neuropsychiatric risk mechanisms.

Patowary, Ashok; Zhang, Pan; Jops, Connor; Vuong, Celine K; Ge, Xinzhou; Hou, Kangcheng; Kim, Minsoo; Gong, Naihua; Margolis, Michael; Vo, Daniel; Wang, Xusheng; Liu, Chunyu; Pasaniuc, Bogdan; Li, Jingyi Jessica; Gandal, Michael J; de la Torre-Ubieta, Luis.

Science ; 384(6698): eadh7688, 2024 May 24.

Article in English | MEDLINE | ID: mdl-38781356

ABSTRACT

RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders; yet, the role of cell type-specific splicing and transcript-isoform diversity during human brain development has not been systematically investigated. In this work, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone and cortical plate regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 distinct isoforms, of which 72.6% were novel (not previously annotated in Gencode version 33), and uncovered a substantial contribution of transcript-isoform diversity-regulated by RNA binding proteins-in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to reprioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders.

Subject(s)

Mental Disorders , Neocortex , Neurogenesis , Protein Isoforms , RNA Splicing , Single-Cell Analysis , Transcriptome , Humans , Alternative Splicing , Genetic Predisposition to Disease , Mental Disorders/genetics , Molecular Sequence Annotation , Neocortex/metabolism , Neocortex/embryology , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Neurogenesis/genetics

10.

Generalizability of PGS₃₁₃ for breast cancer risk in a Los Angeles biobank.

Shang, Helen; Ding, Yi; Venkateswaran, Vidhya; Boulier, Kristin; Kathuria-Prakash, Nikhita; Malidarreh, Parisa Boodaghi; Luber, Jacob M; Pasaniuc, Bogdan.

HGG Adv ; 5(3): 100302, 2024 Jul 18.

Article in English | MEDLINE | ID: mdl-38704641

ABSTRACT

Polygenic scores (PGSs) summarize the combined effect of common risk variants and are associated with breast cancer risk in patients without identifiable monogenic risk factors. One of the most well-validated PGSs in breast cancer to date is PGS313, which was developed from a Northern European biobank but has shown attenuated performance in non-European ancestries. We further investigate the generalizability of the PGS313 for American women of European (EA), African (AFR), Asian (EAA), and Latinx (HL) ancestry within one institution with a singular electronic health record (EHR) system, genotyping platform, and quality control process. We found that the PGS313 achieved overlapping areas under the receiver operator characteristic (ROC) curve (AUCs) in females of HL (AUC = 0.68, 95% confidence interval [CI] = 0.65-0.71) and EA ancestry (AUC = 0.70, 95% CI = 0.69-0.71) but lower AUCs for the AFR and EAA populations (AFR: AUC = 0.61, 95% CI = 0.56-0.65; EAA: AUC = 0.64, 95% CI = 0.60-0.680). While PGS313 is associated with hormone-receptor-positive (HR+) disease in EA Americans (odds ratio [OR] = 1.42, 95% CI = 1.16-1.64), this association is lost in African, Latinx, and Asian Americans. In summary, we found that PGS313 was significantly associated with breast cancer but with attenuated accuracy in women of AFR and EAA descent within a singular health system in Los Angeles. Our work further highlights the need for additional validation in diverse cohorts prior to the clinical implementation of PGSs.

Subject(s)

Biological Specimen Banks , Breast Neoplasms , Genetic Predisposition to Disease , Humans , Breast Neoplasms/genetics , Breast Neoplasms/epidemiology , Breast Neoplasms/ethnology , Female , Los Angeles/epidemiology , Middle Aged , Risk Factors , Multifactorial Inheritance , ROC Curve , Adult , Aged , Polymorphism, Single Nucleotide

11.

Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations.

Hou, Kangcheng; Gogarten, Stephanie; Kim, Joohyun; Hua, Xing; Dias, Julie-Alexia; Sun, Quan; Wang, Ying; Tan, Taotao; Atkinson, Elizabeth G; Martin, Alicia; Shortt, Jonathan; Hirbo, Jibril; Li, Yun; Pasaniuc, Bogdan; Zhang, Haoyu.

Bioinformatics ; 40(4)2024 Mar 29.

Article in English | MEDLINE | ID: mdl-38490256

ABSTRACT

SUMMARY: Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION: Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.

Subject(s)

Software , Genotype , Phenotype

12.

Improving genetic risk modeling of dementia from real-world data in underrepresented populations.

Chang, Timothy; Fu, Mingzhou; Valiente-Banuet, Leopoldo; Wadhwa, Satpal; Pasaniuc, Bogdan; Vossel, Keith.

Res Sq ; 2024 Feb 15.

Article in English | MEDLINE | ID: mdl-38410460

ABSTRACT

BACKGROUND: Genetic risk modeling for dementia offers significant benefits, but studies based on real-world data, particularly for underrepresented populations, are limited. METHODS: We employed an Elastic Net model for dementia risk prediction using single-nucleotide polymorphisms prioritized by functional genomic data from multiple neurodegenerative disease genome-wide association studies. We compared this model with APOE and polygenic risk score models across genetic ancestry groups, using electronic health records from UCLA Health for discovery and All of Us cohort for validation. RESULTS: Our model significantly outperforms other models across multiple ancestries, improving the area-under-precision-recall curve by 21-61% and the area-under-the-receiver-operating characteristic by 10-21% compared to the APOEand the polygenic risk score models. We identified shared and ancestry-specific risk genes and biological pathways, reinforcing and adding to existing knowledge. CONCLUSIONS: Our study highlights benefits of integrating functional mapping, multiple neurodegenerative diseases, and machine learning for genetic risk models in diverse populations. Our findings hold potential for refining precision medicine strategies in dementia diagnosis.

13.

Improving genetic risk modeling of dementia from real-world data in underrepresented populations.

Fu, Mingzhou; Valiente-Banuet, Leopoldo; Wadhwa, Satpal S; Pasaniuc, Bogdan; Vossel, Keith; Chang, Timothy S.

medRxiv ; 2024 Feb 06.

Article in English | MEDLINE | ID: mdl-38370649

ABSTRACT

BACKGROUND: Genetic risk modeling for dementia offers significant benefits, but studies based on real-world data, particularly for underrepresented populations, are limited. METHODS: We employed an Elastic Net model for dementia risk prediction using single-nucleotide polymorphisms prioritized by functional genomic data from multiple neurodegenerative disease genome-wide association studies. We compared this model with APOE and polygenic risk score models across genetic ancestry groups, using electronic health records from UCLA Health for discovery and All of Us cohort for validation. RESULTS: Our model significantly outperforms other models across multiple ancestries, improving the area-under-precision-recall curve by 21-61% and the area-under-the-receiver-operating characteristic by 10-21% compared to the APOE and the polygenic risk score models. We identified shared and ancestry-specific risk genes and biological pathways, reinforcing and adding to existing knowledge. CONCLUSIONS: Our study highlights benefits of integrating functional mapping, multiple neurodegenerative diseases, and machine learning for genetic risk models in diverse populations. Our findings hold potential for refining precision medicine strategies in dementia diagnosis.

14.

Multi-class Modeling Identifies Shared Genetic Risk for Late-onset Epilepsy and Alzheimer's Disease.

Fu, Mingzhou; Tran, Thai; Eskin, Eleazar; Lajonchere, Clara; Pasaniuc, Bogdan; Geschwind, Daniel H; Vossel, Keith; Chang, Timothy S.

medRxiv ; 2024 Feb 06.

Article in English | MEDLINE | ID: mdl-38370677

ABSTRACT

Background: Previous studies have established a strong link between late-onset epilepsy (LOE) and Alzheimer's disease (AD). However, their shared genetic risk beyond the APOE gene remains unclear. Our study sought to examine the shared genetic factors of AD and LOE, interpret the biological pathways involved, and evaluate how AD onset may be mediated by LOE and shared genetic risks. Methods: We defined phenotypes using phecodes mapped from diagnosis codes, with patients' records aged 60-90. A two-step Least Absolute Shrinkage and Selection Operator (LASSO) workflow was used to identify shared genetic variants based on prior AD GWAS integrated with functional genomic data. We calculated an AD-LOE shared risk score and used it as a proxy in a causal mediation analysis. We used electronic health records from an academic health center (UCLA Health) for discovery analyses and validated our findings in a multi-institutional EHR database (All of Us). Results: The two-step LASSO method identified 34 shared genetic loci between AD and LOE, including the APOE region. These loci were mapped to 65 genes, which showed enrichment in molecular functions and pathways such as tau protein binding and lipoprotein metabolism. Individuals with high predicted shared risk scores have a higher risk of developing AD, LOE, or both in their later life compared to those with low-risk scores. LOE partially mediates the effect of AD-LOE shared genetic risk on AD (15% proportion mediated on average). Validation results from All of Us were consistent with findings from the UCLA sample. Conclusions: We employed a machine learning approach to identify shared genetic risks of AD and LOE. In addition to providing substantial evidence for the significant contribution of the APOE-TOMM40-APOC1 gene cluster to shared risk, we uncovered novel genes that may contribute. Our study is one of the first to utilize All of Us genetic data to investigate AD, and provides valuable insights into the potential common and disease-specific mechanisms underlying AD and LOE, which could have profound implications for the future of disease prevention and the development of targeted treatment strategies to combat the co-occurrence of these two diseases.

15.

Cell-type deconvolution of bulk-blood RNA-seq reveals biological insights into neuropsychiatric disorders.

Boltz, Toni; Schwarz, Tommer; Bot, Merel; Hou, Kangcheng; Caggiano, Christa; Lapinska, Sandra; Duan, Chenda; Boks, Marco P; Kahn, Rene S; Zaitlen, Noah; Pasaniuc, Bogdan; Ophoff, Roel.

Am J Hum Genet ; 111(2): 323-337, 2024 02 01.

Article in English | MEDLINE | ID: mdl-38306997

ABSTRACT

Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.

Subject(s)

Genome-Wide Association Study , Lithium , Humans , Genome-Wide Association Study/methods , RNA-Seq , Quantitative Trait Loci/genetics , Phenotype , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease

16.

Polygenic scores for tobacco use provide insights into systemic health risks in a diverse EHR-linked biobank in Los Angeles.

Venkateswaran, Vidhya; Boulier, Kristin; Ding, Yi; Johnson, Ruth; Bhattacharya, Arjun; Pasaniuc, Bogdan.

Transl Psychiatry ; 14(1): 38, 2024 Jan 18.

Article in English | MEDLINE | ID: mdl-38238290

ABSTRACT

Tobacco use is a major risk factor for many diseases and is heavily influenced by environmental factors with significant underlying genetic contributions. Here, we evaluated the predictive performance, risk stratification, and potential systemic health effects of tobacco use disorder (TUD) predisposing germline variants using a European- ancestry-derived polygenic score (PGS) in 24,202 participants from the multi-ancestry, hospital-based UCLA ATLAS biobank. Among genetically inferred ancestry groups (GIAs), TUD-PGS was significantly associated with TUD in European American (EA) (OR: 1.20, CI: [1.16, 1.24]), Hispanic/Latin American (HL) (OR:1.19, CI: [1.11, 1.28]), and East Asian American (EAA) (OR: 1.18, CI: [1.06, 1.31]) GIAs but not in African American (AA) GIA (OR: 1.04, CI: [0.93, 1.17]). Similarly, TUD-PGS offered strong risk stratification across PGS quantiles in EA and HL GIAs and inconsistently in EAA and AA GIAs. In a cross-ancestry phenome-wide association meta-analysis, TUD-PGS was associated with cardiometabolic, respiratory, and psychiatric phecodes (17 phecodes at P < 2.7E-05). In individuals with no history of smoking, the top TUD-PGS associations with obesity and alcohol-related disorders (P = 3.54E-07, 1.61E-06) persist. Mendelian Randomization (MR) analysis provides evidence of a causal association between adiposity measures and tobacco use. Inconsistent predictive performance of the TUD-PGS across GIAs motivates the inclusion of multiple ancestry populations at all levels of genetic research of tobacco use for equitable clinical translation of TUD-PGS. Phenome associations suggest that TUD-predisposed individuals may require comprehensive tobacco use prevention and management approaches to address underlying addictive tendencies.

Subject(s)

Biological Specimen Banks , Tobacco Use Disorder , Humans , Los Angeles , Tobacco Use , Tobacco Use Disorder/genetics , Risk Factors , Obesity , Genome-Wide Association Study

17.

Multi-ancestry polygenic risk scores for venous thromboembolism.

Jee, Yon Ho; Thibord, Florian; Dominguez, Alicia; Sept, Corriene; Boulier, Kristin; Venkateswaran, Vidhya; Ding, Yi; Cherlin, Tess; Verma, Shefali Setia; Faro, Valeria Lo; Bartz, Traci M; Boland, Anne; Brody, Jennifer A; Deleuze, Jean-Francois; Emmerich, Joseph; Germain, Marine; Johnson, Andrew D; Kooperberg, Charles; Morange, Pierre-Emmanuel; Pankratz, Nathan; Psaty, Bruce M; Reiner, Alexander P; Smadja, David M; Sitlani, Colleen M; Suchon, Pierre; Tang, Weihong; Trégouët, David-Alexandre; Zöllner, Sebastian; Pasaniuc, Bogdan; Damrauer, Scott M; Sanna, Serena; Snieder, Harold; Kabrhel, Christopher; Smith, Nicholas L; Kraft, Peter.

medRxiv ; 2024 Jan 10.

Article in English | MEDLINE | ID: mdl-38260294

ABSTRACT

Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium GWAS meta-analyses of European- (71,771 cases and 1,059,740 controls) and African-ancestry samples (7,482 cases and 129,975 controls). We used LDpred2 and PRSCSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6,261 cases and 88,238 controls) and African-ancestry sample (1,385 cases and 12,569 controls). Multi-ancestry PRSs with weights tuned in European- and African-ancestry samples, respectively, outperformed ancestry-specific PRSs in European- (PRSCSXEUR: AUC=0.61 (0.60, 0.61), PRSCSX_combinedEUR: AUC=0.61 (0.60, 0.62)) and African-ancestry test samples (PRSCSXAFR: AUC=0.58 (0.57, 0.6), PRSCSX_combined AFR: AUC=0.59 (0.57, 0.60)). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS may be used to identify individuals at highest risk for VTE and provide guidance for the most effective treatment strategy across diverse populations.

18.

Principles and methods for transferring polygenic risk scores across global populations.

Kachuri, Linda; Chatterjee, Nilanjan; Hirbo, Jibril; Schaid, Daniel J; Martin, Iman; Kullo, Iftikhar J; Kenny, Eimear E; Pasaniuc, Bogdan; Witte, John S; Ge, Tian.

Nat Rev Genet ; 25(1): 8-25, 2024 Jan.

Article in English | MEDLINE | ID: mdl-37620596

ABSTRACT

Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.

Subject(s)

Genetic Predisposition to Disease , Humans , Risk Factors , Multifactorial Inheritance , Precision Medicine , Genome-Wide Association Study

19.

Session Introduction: Overcoming health disparities in precision medicine.

De La Vega, Francisco M; Barnes, Kathleen C; Fox, Keolu; Ioannidis, Alexander; Kenny, Eimear; Mathias, Rasika A; Pasaniuc, Bogdan.

Pac Symp Biocomput ; 29: 322-326, 2024.

Article in English | MEDLINE | ID: mdl-38160289

ABSTRACT

The following sections are included:OverviewDealing with the lack of diversity in current research datasetsDevelopment of fair machine learning algorithmsRace, genetic ancestry, and population structureConclusionAcknowledgments.

Subject(s)

Computational Biology , Precision Medicine , Humans , Machine Learning , Health Inequities

20.

Interplay Of Serum Bilirubin and Tobacco Smoking with Lung and Head and Neck Cancers in a Diverse, EHR-linked Los Angeles Biobank.

Venkateswaran, Vidhya; Petter, Ella; Boulier, Kristin; Ding, Yi; Bhattacharya, Arjun; Pasaniuc, Bogdan.

Res Sq ; 2023 Oct 24.

Article in English | MEDLINE | ID: mdl-37961486

ABSTRACT

Background: Bilirubin is a potent antioxidant with a protective role in many diseases. We examined the relationships between serum bilirubin (SB) levels, tobacco smoking (a known cause of low SB), and aerodigestive cancers, grouped as lung cancers (LC) and head and neck cancers (HNC). Methods: We examined the associations between SB, LC, and HNC using data from 393,210 participants from a real-world, diverse, de-identified data repository and biobank linked to the UCLA Health system. We employed regression models, propensity score matching, and polygenic scores to investigate the associations and interactions between SB, tobacco smoking, LC, and HNC. Results: Current tobacco smokers showed lower SB (-0.04mg/dL, 95% CI: [-0.04, -0.03]), compared to never-smokers. Lower SB levels were observed in HNC and LC cases (-0.10 mg/dL, [-0.13, -0.09] and - 0.09 mg/dL, CI [-0.1, -0.07] respectively) compared to cancer-free controls with the effect persisting after adjusting for smoking. SB levels were inversely associated with HNC and LC risk (ORs per SD change in SB: 0.64, CI [0.59,0.69] and 0.57, CI [0.43,0.75], respectively). Lastly, a polygenic score (PGS) for SB was associated with LC (OR per SD change of SB-PGS: 0.71, CI [0.67, 0.76]). Conclusions: Low SB levels are associated with an increased risk of both HNC and LC, independent of the effect of tobacco smoking. Additionally, tobacco smoking demonstrated a strong interaction with SB on LC risk. Lastly, genetically predicted low SB (using a polygenic score) is negatively associated with LC. These findings suggest that SB could serve as a potential early and low-cost biomarker for LC and HNC. The interaction with tobacco smoking suggests that smokers with lower bilirubin could likely be at higher risk for LC compared to never smokers, suggesting the utility of SB in risk stratification for patients at risk for LC. Lastly, the results of the polygenic score analyses suggest potential shared biological pathways between the genetic control of SB and the risk of LC development.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL