Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 77
Filter
1.
Am J Hum Genet ; 2024 May 08.
Article in English | MEDLINE | ID: mdl-38733992

ABSTRACT

Splicing-based transcriptome-wide association studies (splicing-TWASs) of breast cancer have the potential to identify susceptibility genes. However, existing splicing-TWASs test the association of individual excised introns in breast tissue only and thus have limited power to detect susceptibility genes. In this study, we performed a multi-tissue joint splicing-TWAS that integrated splicing-TWAS signals of multiple excised introns in each gene across 11 tissues that are potentially relevant to breast cancer risk. We utilized summary statistics from a meta-analysis that combined genome-wide association study (GWAS) results of 424,650 women of European ancestry. Splicing-level prediction models were trained in GTEx (v.8) data. We identified 240 genes by the multi-tissue joint splicing-TWAS at the Bonferroni-corrected significance level; in the tissue-specific splicing-TWAS that combined TWAS signals of excised introns in genes in breast tissue only, we identified nine additional significant genes. Of these 249 genes, 88 genes in 62 loci have not been reported by previous TWASs, and 17 genes in seven loci are at least 1 Mb away from published GWAS index variants. By comparing the results of our splicing-TWASs with previous gene-expression-based TWASs that used the same summary statistics and expression prediction models trained in the same reference panel, we found that 110 genes in 70 loci that are identified only by the splicing-TWASs. Our results showed that for many genes, expression quantitative trait loci (eQTL) did not show a significant impact on breast cancer risk, whereas splicing quantitative trait loci (sQTL) showed a strong impact through intron excision events.

2.
Am J Hum Genet ; 111(3): 445-455, 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38320554

ABSTRACT

Regulation of transcription and translation are mechanisms through which genetic variants affect complex traits. Expression quantitative trait locus (eQTL) studies have been more successful at identifying cis-eQTL (within 1 Mb of the transcription start site) than trans-eQTL. Here, we tested the cis component of gene expression for association with observed plasma protein levels to identify cis- and trans-acting genes that regulate protein levels. We used transcriptome prediction models from 49 Genotype-Tissue Expression (GTEx) Project tissues to predict the cis component of gene expression and tested the predicted expression of every gene in every tissue for association with the observed abundance of 3,622 plasma proteins measured in 3,301 individuals from the INTERVAL study. We tested significant results for replication in 971 individuals from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA). We found 1,168 and 1,210 cis- and trans-acting associations that replicated in TOPMed (FDR < 0.05) with a median expected true positive rate (π1) across tissues of 0.806 and 0.390, respectively. The target proteins of trans-acting genes were enriched for transcription factor binding sites and autoimmune diseases in the GWAS catalog. Furthermore, we found a higher correlation between predicted expression and protein levels of the same underlying gene (R = 0.17) than observed expression (R = 0.10, p = 7.50 × 10-11). This indicates the cis-acting genetically regulated (heritable) component of gene expression is more consistent across tissues than total observed expression (genetics + environment) and is useful in uncovering the function of SNPs associated with complex traits.


Subject(s)
Proteome , Transcriptome , Humans , Transcriptome/genetics , Proteome/genetics , Multifactorial Inheritance , Quantitative Trait Loci/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics
3.
HGG Adv ; 4(4): 100216, 2023 Oct 12.
Article in English | MEDLINE | ID: mdl-37869564

ABSTRACT

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.


Subject(s)
Genome-Wide Association Study , Transcriptome , Humans , Transcriptome/genetics , Quantitative Trait Loci/genetics , Gene Frequency , Linkage Disequilibrium
4.
bioRxiv ; 2023 Oct 20.
Article in English | MEDLINE | ID: mdl-37904952

ABSTRACT

Hundreds of thousands of loci have been associated with complex traits via genome-wide association studies (GWAS), but an understanding of the mechanistic connection between GWAS loci and disease remains elusive. Genetic predictors of molecular traits are useful for identifying the mediating roles of molecular traits and prioritizing actionable targets for intervention, as demonstrated in transcriptome-wide association studies (TWAS) and related studies. Given the widespread polygenicity of complex traits, it is imperative to understand the effect of polygenicity on the validity of these mediator-trait association tests. We found that for highly polygenic target traits, the standard test based on linear regression is inflated Eχtwas2>1. This inflation has implications for all TWAS and related methods where the complex trait can be highly polygenic-even if the mediating trait is sparse. We derive an asymptotic expression of the inflation, estimate the inflation for gene expression, metabolites, and brain image derived features, and propose a solution to correct the inflation.

5.
HGG Adv ; 4(4): 100233, 2023 10 12.
Article in English | MEDLINE | ID: mdl-37663543

ABSTRACT

In this study we examined how genetic risk for asthma associates with different features of the disease and with other medical conditions and traits. Using summary statistics from two multi-ancestry genome-wide association studies of asthma, we modeled polygenic risk scores (PRSs) and validated their predictive performance in the UK Biobank. We then performed phenome-wide association studies of the asthma PRSs with 371 heritable traits in the UK Biobank. We identified 228 total significant associations across a variety of organ systems, including associations that varied by PRS model, sex, age of asthma onset, ancestry, and human leukocyte antigen region alleles. Our results highlight pervasive pleiotropy between asthma and numerous other traits and conditions and elucidate pathways that contribute to asthma and its comorbidities.


Subject(s)
Asthma , Genome-Wide Association Study , Humans , Asthma/genetics , Risk Factors , Multifactorial Inheritance/genetics , Phenomics
6.
Cancer Epidemiol Biomarkers Prev ; 32(9): 1198-1207, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37409955

ABSTRACT

BACKGROUND: Predicting protein levels from genotypes for proteome-wide association studies (PWAS) may provide insight into the mechanisms underlying cancer susceptibility. METHODS: We performed PWAS of breast, endometrial, ovarian, and prostate cancers and their subtypes in several large European-ancestry discovery consortia (effective sample size: 237,483 cases/317,006 controls) and tested the results for replication in an independent European-ancestry GWAS (31,969 cases/410,350 controls). We performed PWAS using the cancer GWAS summary statistics and two sets of plasma protein prediction models, followed by colocalization analysis. RESULTS: Using Atherosclerosis Risk in Communities (ARIC) models, we identified 93 protein-cancer associations [false discovery rate (FDR) < 0.05]. We then performed a meta-analysis of the discovery and replication PWAS, resulting in 61 significant protein-cancer associations (FDR < 0.05). Ten of 15 protein-cancer pairs that could be tested using Trans-Omics for Precision Medicine (TOPMed) protein prediction models replicated with the same directions of effect in both cancer GWAS (P < 0.05). To further support our results, we applied Bayesian colocalization analysis and found colocalized SNPs for SERPINA3 protein levels and prostate cancer (posterior probability, PP = 0.65) and SNUPN protein levels and breast cancer (PP = 0.62). CONCLUSIONS: We used PWAS to identify potential biomarkers of hormone-related cancer risk. SNPs in SERPINA3 and SNUPN did not reach genome-wide significance for cancer in the original GWAS, highlighting the power of PWAS for novel locus discovery, with the added advantage of providing directions of protein effect. IMPACT: PWAS and colocalization are promising methods to identify potential molecular mechanisms underlying complex traits.


Subject(s)
Endometrial Neoplasms , Prostatic Neoplasms , Male , Female , Humans , Proteome/genetics , Genetic Predisposition to Disease , Prostate , Bayes Theorem , Genome-Wide Association Study , Endometrial Neoplasms/genetics , Prostatic Neoplasms/genetics , Blood Proteins , Polymorphism, Single Nucleotide
7.
Am J Hum Genet ; 110(6): 950-962, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37164006

ABSTRACT

Genome-wide association studies (GWASs) have identified more than 200 genomic loci for breast cancer risk, but specific causal genes in most of these loci have not been identified. In fact, transcriptome-wide association studies (TWASs) of breast cancer performed using gene expression prediction models trained in breast tissue have yet to clearly identify most target genes. To identify candidate genes, we performed a GWAS analysis in a breast cancer dataset from UK Biobank (UKB) and combined the results with the GWAS results of the Breast Cancer Association Consortium (BCAC) by a meta-analysis. Using the summary statistics from the meta-analysis, we performed a joint TWAS analysis that combined TWAS signals from multiple tissues. We used expression prediction models trained in 11 tissues that are potentially relevant to breast cancer from the Genotype-Tissue Expression (GTEx) data. In the GWAS analysis, we identified eight loci distinct from those reported previously. In the TWAS analysis, we identified 309 genes at 108 genomic loci to be significantly associated with breast cancer at the Bonferroni threshold. Of these, 17 genes were located in eight regions that were at least 1 Mb away from published GWAS hits. The remaining TWAS-significant genes were located in 100 known genomic loci from previous GWASs of breast cancer. We found that 21 genes located in known GWAS loci remained statistically significant after conditioning on previous GWAS index variants. Our study provides insights into breast cancer genetics through mapping candidate target genes in a large proportion of known GWAS loci and discovering multiple new loci.


Subject(s)
Breast Neoplasms , Transcriptome , Humans , Female , Transcriptome/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Breast Neoplasms/genetics , Quantitative Trait Loci/genetics , Polymorphism, Single Nucleotide/genetics
8.
Mol Syst Biol ; 19(8): e11407, 2023 08 08.
Article in English | MEDLINE | ID: mdl-37232043

ABSTRACT

How do aberrations in widely expressed genes lead to tissue-selective hereditary diseases? Previous attempts to answer this question were limited to testing a few candidate mechanisms. To answer this question at a larger scale, we developed "Tissue Risk Assessment of Causality by Expression" (TRACE), a machine learning approach to predict genes that underlie tissue-selective diseases and selectivity-related features. TRACE utilized 4,744 biologically interpretable tissue-specific gene features that were inferred from heterogeneous omics datasets. Application of TRACE to 1,031 disease genes uncovered known and novel selectivity-related features, the most common of which was previously overlooked. Next, we created a catalog of tissue-associated risks for 18,927 protein-coding genes (https://netbio.bgu.ac.il/trace/). As proof-of-concept, we prioritized candidate disease genes identified in 48 rare-disease patients. TRACE ranked the verified disease gene among the patient's candidate genes significantly better than gene prioritization methods that rank by gene constraint or tissue expression. Thus, tissue selectivity combined with machine learning enhances genetic and clinical understanding of hereditary diseases.


Subject(s)
Machine Learning , Rare Diseases , Humans , Rare Diseases/genetics , Risk Assessment , Causality
9.
bioRxiv ; 2023 May 20.
Article in English | MEDLINE | ID: mdl-36798214

ABSTRACT

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.

10.
Am J Hum Genet ; 110(1): 44-57, 2023 01 05.
Article in English | MEDLINE | ID: mdl-36608684

ABSTRACT

Integrative genetic association methods have shown great promise in post-GWAS (genome-wide association study) analyses, in which one of the most challenging tasks is identifying putative causal genes and uncovering molecular mechanisms of complex traits. Recent studies suggest that prevailing computational approaches, including transcriptome-wide association studies (TWASs) and colocalization analysis, are individually imperfect, but their joint usage can yield robust and powerful inference results. This paper presents INTACT, a computational framework to integrate probabilistic evidence from these distinct types of analyses and implicate putative causal genes. This procedure is flexible and can work with a wide range of existing integrative analysis approaches. It has the unique ability to quantify the uncertainty of implicated genes, enabling rigorous control of false-positive discoveries. Taking advantage of this highly desirable feature, we further propose an efficient algorithm, INTACT-GSE, for gene set enrichment analysis based on the integrated probabilistic evidence. We examine the proposed computational methods and illustrate their improved performance over the existing approaches through simulation studies. We apply the proposed methods to analyze the multi-tissue eQTL data from the GTEx project and eight large-scale complex- and molecular-trait GWAS datasets from multiple consortia and the UK Biobank. Overall, we find that the proposed methods markedly improve the existing putative gene implication methods and are particularly advantageous in evaluating and identifying key gene sets and biological pathways underlying complex traits.


Subject(s)
Genome-Wide Association Study , Transcriptome , Humans , Transcriptome/genetics , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Computer Simulation , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease
12.
Nat Comput Sci ; 3(5): 403-417, 2023 May.
Article in English | MEDLINE | ID: mdl-38177845

ABSTRACT

Human diseases are traditionally studied as singular, independent entities, limiting researchers' capacity to view human illnesses as dependent states in a complex, homeostatic system. Here, using time-stamped clinical records of over 151 million unique Americans, we construct a disease representation as points in a continuous, high-dimensional space, where diseases with similar etiology and manifestations lie near one another. We use the UK Biobank cohort, with half a million participants, to perform a genome-wide association study of newly defined human quantitative traits reflecting individuals' health states, corresponding to patient positions in our disease space. We discover 116 genetic associations involving 108 genetic loci and then use ten disease constellations resulting from clustering analysis of diseases in the embedding space, as well as 30 common diseases, to demonstrate that these genetic associations can be used to robustly predict various morbidities.


Subject(s)
Genetic Loci , Genome-Wide Association Study , Humans , United States , Genome-Wide Association Study/methods , Phenotype
13.
Nat Commun ; 13(1): 6712, 2022 11 07.
Article in English | MEDLINE | ID: mdl-36344522

ABSTRACT

Asthma is a heterogeneous, complex syndrome, and identifying asthma endotypes has been challenging. We hypothesize that distinct endotypes of asthma arise in disparate genetic variation and life-time environmental exposure backgrounds, and that disease comorbidity patterns serve as a surrogate for such genetic and exposure variations. Here, we computationally discover 22 distinct comorbid disease patterns among individuals with asthma (asthma comorbidity subgroups) using diagnosis records for >151 M US residents, and re-identify 11 of the 22 subgroups in the much smaller UK Biobank. GWASs to discern asthma risk loci for individuals within each subgroup and in all subgroups combined reveal 109 independent risk loci, of which 52 are replicated in multi-ancestry meta-analysis across different ethnicity subsamples in UK Biobank, US BioVU, and BioBank Japan. Fourteen loci confer asthma risk in multiple subgroups and in all subgroups combined. Importantly, another six loci confer asthma risk in only one subgroup. The strength of association between asthma and each of 44 health-related phenotypes also varies dramatically across subgroups. This work reveals subpopulations of asthma patients distinguished by comorbidity patterns, asthma risk loci, gene expression, and health-related phenotypes, and so reveals different asthma endotypes.


Subject(s)
Asthma , Humans , Asthma/epidemiology , Asthma/genetics , Genome-Wide Association Study , Phenotype , Comorbidity , Japan/epidemiology
14.
Genome Med ; 14(1): 55, 2022 05 24.
Article in English | MEDLINE | ID: mdl-35606880

ABSTRACT

BACKGROUND: Genome-wide association studies of asthma have revealed robust associations with variation across the human leukocyte antigen (HLA) complex with independent associations in the HLA class I and class II regions for both childhood-onset asthma (COA) and adult-onset asthma (AOA). However, the specific variants and genes contributing to risk are unknown. METHODS: We used Bayesian approaches to perform genetic fine-mapping for COA and AOA (n=9432 and 21,556, respectively; n=318,167 shared controls) in White British individuals from the UK Biobank and to perform expression quantitative trait locus (eQTL) fine-mapping in immune (lymphoblastoid cell lines, n=398; peripheral blood mononuclear cells, n=132) and airway (nasal epithelial cells, n=188) cells from ethnically diverse individuals. We also examined putatively causal protein coding variation from protein crystal structures and conducted replication studies in independent multi-ethnic cohorts from the UK Biobank (COA n=1686; AOA n=3666; controls n=56,063). RESULTS: Genetic fine-mapping revealed both shared and distinct causal variation between COA and AOA in the class I region but only distinct causal variation in the class II region. Both gene expression levels and amino acid variation contributed to risk. Our results from eQTL fine-mapping and amino acid visualization suggested that the HLA-DQA1*03:01 allele and variation associated with expression of the nonclassical HLA-DQA2 and HLA-DQB2 genes accounted entirely for the most significant association with AOA in GWAS. Our studies also suggested a potentially prominent role for HLA-C protein coding variation in the class I region in COA. We replicated putatively causal variant associations in a multi-ethnic cohort. CONCLUSIONS: We highlight roles for both gene expression and protein coding variation in asthma risk and identified putatively causal variation and genes in the HLA region. A convergence of genomic, transcriptional, and protein coding evidence implicates the HLA-DQA2 and HLA-DQB2 genes and HLA-DQA1*03:01 allele in AOA.


Subject(s)
Asthma , Genome-Wide Association Study , Adult , Amino Acids/genetics , Asthma/genetics , Bayes Theorem , Child , Coenzyme A/genetics , Genetic Predisposition to Disease , Humans , Leukocytes, Mononuclear , Polymorphism, Single Nucleotide
15.
Am J Hum Genet ; 109(5): 857-870, 2022 05 05.
Article in English | MEDLINE | ID: mdl-35385699

ABSTRACT

While polygenic risk scores (PRSs) enable early identification of genetic risk for chronic obstructive pulmonary disease (COPD), predictive performance is limited when the discovery and target populations are not well matched. Hypothesizing that the biological mechanisms of disease are shared across ancestry groups, we introduce a PrediXcan-derived polygenic transcriptome risk score (PTRS) to improve cross-ethnic portability of risk prediction. We constructed the PTRS using summary statistics from application of PrediXcan on large-scale GWASs of lung function (forced expiratory volume in 1 s [FEV1] and its ratio to forced vital capacity [FEV1/FVC]) in the UK Biobank. We examined prediction performance and cross-ethnic portability of PTRS through smoking-stratified analyses both on 29,381 multi-ethnic participants from TOPMed population/family-based cohorts and on 11,771 multi-ethnic participants from TOPMed COPD-enriched studies. Analyses were carried out for two dichotomous COPD traits (moderate-to-severe and severe COPD) and two quantitative lung function traits (FEV1 and FEV1/FVC). While the proposed PTRS showed weaker associations with disease than PRS for European ancestry, the PTRS showed stronger association with COPD than PRS for African Americans (e.g., odds ratio [OR] = 1.24 [95% confidence interval [CI]: 1.08-1.43] for PTRS versus 1.10 [0.96-1.26] for PRS among heavy smokers with ≥ 40 pack-years of smoking) for moderate-to-severe COPD. Cross-ethnic portability of the PTRS was significantly higher than the PRS (paired t test p < 2.2 × 10-16 with portability gains ranging from 5% to 28%) for both dichotomous COPD traits and across all smoking strata. Our study demonstrates the value of PTRS for improved cross-ethnic portability compared to PRS in predicting COPD risk.


Subject(s)
Pulmonary Disease, Chronic Obstructive , Transcriptome , Humans , Lung , National Heart, Lung, and Blood Institute (U.S.) , Pulmonary Disease, Chronic Obstructive/genetics , Risk Factors , United States/epidemiology
16.
PLoS One ; 17(2): e0264341, 2022.
Article in English | MEDLINE | ID: mdl-35202437

ABSTRACT

Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ∼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.


Subject(s)
Atherosclerosis/genetics , Genetic Association Studies , Models, Genetic , Proteins/genetics , Proteome/genetics , Atherosclerosis/ethnology , Female , Gene Frequency , Humans , Male , Pilot Projects , Polymorphism, Single Nucleotide , Quantitative Trait Loci
17.
Genome Biol ; 23(1): 23, 2022 01 13.
Article in English | MEDLINE | ID: mdl-35027082

ABSTRACT

BACKGROUND: Polygenic risk scores (PRS) are valuable to translate the results of genome-wide association studies (GWAS) into clinical practice. To date, most GWAS have been based on individuals of European-ancestry leading to poor performance in populations of non-European ancestry. RESULTS: We introduce the polygenic transcriptome risk score (PTRS), which is based on predicted transcript levels (rather than SNPs), and explore the portability of PTRS across populations using UK Biobank data. CONCLUSIONS: We show that PTRS has a significantly higher portability (Wilcoxon p=0.013) in the African-descent samples where the loss of performance is most acute with better performance than PRS when used in combination.


Subject(s)
Genome-Wide Association Study , Transcriptome , Genetic Predisposition to Disease , Humans , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Risk Factors
18.
HGG Adv ; 2(2)2021 Apr 08.
Article in English | MEDLINE | ID: mdl-33937878

ABSTRACT

Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.

19.
Cell ; 184(10): 2633-2648.e19, 2021 05 13.
Article in English | MEDLINE | ID: mdl-33864768

ABSTRACT

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.


Subject(s)
Disease/genetics , Multifactorial Inheritance/genetics , Population/genetics , RNA, Long Noncoding/genetics , Transcriptome , Coronary Artery Disease/genetics , Diabetes Mellitus, Type 1/genetics , Diabetes Mellitus, Type 2/genetics , Gene Expression Profiling , Genetic Variation , Humans , Inflammatory Bowel Diseases/genetics , Organ Specificity/genetics , Quantitative Trait Loci
20.
Nat Commun ; 12(1): 1424, 2021 03 03.
Article in English | MEDLINE | ID: mdl-33658504

ABSTRACT

Genetic studies of the transcriptome help bridge the gap between genetic variation and phenotypes. To maximize the potential of such studies, efficient methods to identify expression quantitative trait loci (eQTLs) and perform fine-mapping and genetic prediction of gene expression traits are needed. Current methods that leverage both total read counts and allele-specific expression to identify eQTLs are generally computationally intractable for large transcriptomic studies. Here, we describe a unified framework that addresses these needs and is scalable to thousands of samples. Using simulations and data from GTEx, we demonstrate its calibration and performance. For example, mixQTL shows a power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. To showcase the potential of mixQTL, we apply it to 49 GTEx tissues and find 20% additional eQTLs (FDR < 0.05, per tissue) that are significantly more enriched among trait associated variants and candidate cis-regulatory elements comparing to the standard approach.


Subject(s)
Alleles , Chromosome Mapping/methods , Quantitative Trait Loci , Databases, Genetic , Genome-Wide Association Study , Human Genome Project , Humans , Models, Genetic , Models, Statistical , Regulatory Sequences, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...