Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI.

Sun, Quan; Rowland, Bryce T; Chen, Jiawen; Mikhaylova, Anna V; Avery, Christy; Peters, Ulrike; Lundin, Jessica; Matise, Tara; Buyske, Steve; Tao, Ran; Mathias, Rasika A; Reiner, Alexander P; Auer, Paul L; Cox, Nancy J; Kooperberg, Charles; Thornton, Timothy A; Raffield, Laura M; Li, Yun.

Nat Commun ; 15(1): 1016, 2024 Feb 03.

Artigo em Inglês | MEDLINE | ID: mdl-38310129

RESUMO

Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.

Assuntos

Negro ou Afro-Americano , Estratificação de Risco Genético , Software , Humanos , Negro ou Afro-Americano/genética , Simulação por Computador , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Fatores de Risco

Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations.

Araujo, Daniel S; Nguyen, Chris; Hu, Xiaowei; Mikhaylova, Anna V; Gignoux, Chris; Ardlie, Kristin; Taylor, Kent D; Durda, Peter; Liu, Yongmei; Papanicolaou, George; Cho, Michael H; Rich, Stephen S; Rotter, Jerome I; Im, Hae Kyung; Manichaikul, Ani; Wheeler, Heather E.

HGG Adv ; 4(4): 100216, 2023 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-37869564

RESUMO

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.

Assuntos

Estudo de Associação Genômica Ampla , Transcriptoma , Humanos , Transcriptoma/genética , Locos de Características Quantitativas/genética , Frequência do Gene , Desequilíbrio de Ligação

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations.

bioRxiv ; 2023 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-36798214

RESUMO

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.

Protein prediction for trait mapping in diverse populations.

Schubert, Ryan; Geoffroy, Elyse; Gregga, Isabelle; Mulford, Ashley J; Aguet, Francois; Ardlie, Kristin; Gerszten, Robert; Clish, Clary; Van Den Berg, David; Taylor, Kent D; Durda, Peter; Johnson, W Craig; Cornell, Elaine; Guo, Xiuqing; Liu, Yongmei; Tracy, Russell; Conomos, Matthew; Blackwell, Tom; Papanicolaou, George; Lappalainen, Tuuli; Mikhaylova, Anna V; Thornton, Timothy A; Cho, Michael H; Gignoux, Christopher R; Lange, Leslie; Lange, Ethan; Rich, Stephen S; Rotter, Jerome I; Manichaikul, Ani; Im, Hae Kyung; Wheeler, Heather E.

PLoS One ; 17(2): e0264341, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35202437

RESUMO

Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises â¼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.

Assuntos

Aterosclerose/genética , Estudos de Associação Genética , Modelos Genéticos , Proteínas/genética , Proteoma/genética , Aterosclerose/etnologia , Feminino , Frequência do Gene , Humanos , Masculino , Projetos Piloto , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas

Whole-genome sequencing in diverse subjects identifies genetic correlates of leukocyte traits: The NHLBI TOPMed program.

Mikhaylova, Anna V; McHugh, Caitlin P; Polfus, Linda M; Raffield, Laura M; Boorgula, Meher Preethi; Blackwell, Thomas W; Brody, Jennifer A; Broome, Jai; Chami, Nathalie; Chen, Ming-Huei; Conomos, Matthew P; Cox, Corey; Curran, Joanne E; Daya, Michelle; Ekunwe, Lynette; Glahn, David C; Heard-Costa, Nancy; Highland, Heather M; Hobbs, Brian D; Ilboudo, Yann; Jain, Deepti; Lange, Leslie A; Miller-Fleming, Tyne W; Min, Nancy; Moon, Jee-Young; Preuss, Michael H; Rosen, Jonathon; Ryan, Kathleen; Smith, Albert V; Sun, Quan; Surendran, Praveen; de Vries, Paul S; Walter, Klaudia; Wang, Zhe; Wheeler, Marsha; Yanek, Lisa R; Zhong, Xue; Abecasis, Goncalo R; Almasy, Laura; Barnes, Kathleen C; Beaty, Terri H; Becker, Lewis C; Blangero, John; Boerwinkle, Eric; Butterworth, Adam S; Chavan, Sameer; Cho, Michael H; Choquet, Hélène; Correa, Adolfo; Cox, Nancy.

Am J Hum Genet ; 108(10): 1836-1851, 2021 10 07.

Artigo em Inglês | MEDLINE | ID: mdl-34582791

RESUMO

Many common and rare variants associated with hematologic traits have been discovered through imputation on large-scale reference panels. However, the majority of genome-wide association studies (GWASs) have been conducted in Europeans, and determining causal variants has proved challenging. We performed a GWAS of total leukocyte, neutrophil, lymphocyte, monocyte, eosinophil, and basophil counts generated from 109,563,748 variants in the autosomes and the X chromosome in the Trans-Omics for Precision Medicine (TOPMed) program, which included data from 61,802 individuals of diverse ancestry. We discovered and replicated 7 leukocyte trait associations, including (1) the association between a chromosome X, pseudo-autosomal region (PAR), noncoding variant located between cytokine receptor genes (CSF2RA and CLRF2) and lower eosinophil count; and (2) associations between single variants found predominantly among African Americans at the S1PR3 (9q22.1) and HBB (11p15.4) loci and monocyte and lymphocyte counts, respectively. We further provide evidence indicating that the newly discovered eosinophil-lowering chromosome X PAR variant might be associated with reduced susceptibility to common allergic diseases such as atopic dermatitis and asthma. Additionally, we found a burden of very rare FLT3 (13q12.2) variants associated with monocyte counts. Together, these results emphasize the utility of whole-genome sequencing in diverse samples in identifying associations missed by European-ancestry-driven GWASs.

Assuntos

Asma/epidemiologia , Biomarcadores/metabolismo , Dermatite Atópica/epidemiologia , Leucócitos/patologia , Polimorfismo de Nucleotídeo Único , Doença Pulmonar Obstrutiva Crônica/epidemiologia , Locos de Características Quantitativas , Asma/genética , Asma/metabolismo , Asma/patologia , Dermatite Atópica/genética , Dermatite Atópica/metabolismo , Dermatite Atópica/patologia , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , National Heart, Lung, and Blood Institute (U.S.) , Fenótipo , Prognóstico , Proteoma/análise , Proteoma/metabolismo , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/metabolismo , Doença Pulmonar Obstrutiva Crônica/patologia , Reino Unido/epidemiologia , Estados Unidos/epidemiologia , Sequenciamento Completo do Genoma

On the cross-population generalizability of gene expression prediction models.

Keys, Kevin L; Mak, Angel C Y; White, Marquitta J; Eckalbar, Walter L; Dahl, Andrew W; Mefford, Joel; Mikhaylova, Anna V; Contreras, María G; Elhawary, Jennifer R; Eng, Celeste; Hu, Donglei; Huntsman, Scott; Oh, Sam S; Salazar, Sandra; Lenoir, Michael A; Ye, Jimmie C; Thornton, Timothy A; Zaitlen, Noah; Burchard, Esteban G; Gignoux, Christopher R.

PLoS Genet ; 16(8): e1008927, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32797036

RESUMO

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.

Assuntos

Negro ou Afro-Americano/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Transcriptoma , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Estudo de Associação Genômica Ampla/normas , Humanos , Locos de Características Quantitativas , RNA-Seq/métodos , RNA-Seq/normas , Padrões de Referência

Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations.

Mikhaylova, Anna V; Thornton, Timothy A.

Front Genet ; 10: 261, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31001318

RESUMO

Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10-16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10-16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA