RESUMO
Regulation of transcription and translation are mechanisms through which genetic variants affect complex traits. Expression quantitative trait locus (eQTL) studies have been more successful at identifying cis-eQTL (within 1 Mb of the transcription start site) than trans-eQTL. Here, we tested the cis component of gene expression for association with observed plasma protein levels to identify cis- and trans-acting genes that regulate protein levels. We used transcriptome prediction models from 49 Genotype-Tissue Expression (GTEx) Project tissues to predict the cis component of gene expression and tested the predicted expression of every gene in every tissue for association with the observed abundance of 3,622 plasma proteins measured in 3,301 individuals from the INTERVAL study. We tested significant results for replication in 971 individuals from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA). We found 1,168 and 1,210 cis- and trans-acting associations that replicated in TOPMed (FDR < 0.05) with a median expected true positive rate (π1) across tissues of 0.806 and 0.390, respectively. The target proteins of trans-acting genes were enriched for transcription factor binding sites and autoimmune diseases in the GWAS catalog. Furthermore, we found a higher correlation between predicted expression and protein levels of the same underlying gene (R = 0.17) than observed expression (R = 0.10, p = 7.50 × 10-11). This indicates the cis-acting genetically regulated (heritable) component of gene expression is more consistent across tissues than total observed expression (genetics + environment) and is useful in uncovering the function of SNPs associated with complex traits.
Assuntos
Proteoma , Transcriptoma , Humanos , Transcriptoma/genética , Proteoma/genética , Herança Multifatorial , Locos de Características Quantitativas/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.
Assuntos
Negro ou Afro-Americano , Estratificação de Risco Genético , Software , Humanos , Negro ou Afro-Americano/genética , Simulação por Computador , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Fatores de RiscoRESUMO
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.
Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Humanos , Transcriptoma/genética , Locos de Características Quantitativas/genética , Frequência do Gene , Desequilíbrio de LigaçãoRESUMO
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.
RESUMO
Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises â¼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.
Assuntos
Aterosclerose/genética , Estudos de Associação Genética , Modelos Genéticos , Proteínas/genética , Proteoma/genética , Aterosclerose/etnologia , Feminino , Frequência do Gene , Humanos , Masculino , Projetos Piloto , Polimorfismo de Nucleotídeo Único , Locos de Características QuantitativasRESUMO
Many common and rare variants associated with hematologic traits have been discovered through imputation on large-scale reference panels. However, the majority of genome-wide association studies (GWASs) have been conducted in Europeans, and determining causal variants has proved challenging. We performed a GWAS of total leukocyte, neutrophil, lymphocyte, monocyte, eosinophil, and basophil counts generated from 109,563,748 variants in the autosomes and the X chromosome in the Trans-Omics for Precision Medicine (TOPMed) program, which included data from 61,802 individuals of diverse ancestry. We discovered and replicated 7 leukocyte trait associations, including (1) the association between a chromosome X, pseudo-autosomal region (PAR), noncoding variant located between cytokine receptor genes (CSF2RA and CLRF2) and lower eosinophil count; and (2) associations between single variants found predominantly among African Americans at the S1PR3 (9q22.1) and HBB (11p15.4) loci and monocyte and lymphocyte counts, respectively. We further provide evidence indicating that the newly discovered eosinophil-lowering chromosome X PAR variant might be associated with reduced susceptibility to common allergic diseases such as atopic dermatitis and asthma. Additionally, we found a burden of very rare FLT3 (13q12.2) variants associated with monocyte counts. Together, these results emphasize the utility of whole-genome sequencing in diverse samples in identifying associations missed by European-ancestry-driven GWASs.
Assuntos
Asma/epidemiologia , Biomarcadores/metabolismo , Dermatite Atópica/epidemiologia , Leucócitos/patologia , Polimorfismo de Nucleotídeo Único , Doença Pulmonar Obstrutiva Crônica/epidemiologia , Locos de Características Quantitativas , Asma/genética , Asma/metabolismo , Asma/patologia , Dermatite Atópica/genética , Dermatite Atópica/metabolismo , Dermatite Atópica/patologia , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , National Heart, Lung, and Blood Institute (U.S.) , Fenótipo , Prognóstico , Proteoma/análise , Proteoma/metabolismo , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/metabolismo , Doença Pulmonar Obstrutiva Crônica/patologia , Reino Unido/epidemiologia , Estados Unidos/epidemiologia , Sequenciamento Completo do GenomaRESUMO
The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.
Assuntos
Negro ou Afro-Americano/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Transcriptoma , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Estudo de Associação Genômica Ampla/normas , Humanos , Locos de Características Quantitativas , RNA-Seq/métodos , RNA-Seq/normas , Padrões de ReferênciaRESUMO
This article presents regional-level data that can be used for comparative territorial studies on innovation dynamics. The dataset covers a series of 50 indicators grouped into a matrix of 5 elements of regional innovation system (human resources - HR, infrastructure, research & development sector - R&D, innovative milieu, framework conditions) and 5 components of innovation security (economic, scientific and technological - S&T, social, political, geo-ecological). This complex set of interrelated data enables to grasp the catalyst and inhibitor factors that have a significant impact on the sustainable development of a particular regional innovation system. The innovation security approach used enables to consider the locus of innovation processes, account for the relationship between individual components of regional innovation systems and acknowledge for the unique properties of the regions. The database includes statistics for a total set of 85 regions of the Russian Federation over a period of 2015 and 2016. Spatial differentiation is made on to coastal and inland regions. This enables to identify the development patterns as influenced by the global trend of coastalization.
RESUMO
Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10-16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10-16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.
RESUMO
This data article presents macroeconomic data that can be used for comparative territorial studies. The data cover a sample of 413 regions (national administrative-territorial units corresponding to second level of a common classification of territorial units for statistics of the European Commission - NUTS 2 level region of the European Union, and comparable administrative-territorial units outside the EU) of 48 European countries, including Cyprus, Turkey, the European part of Russia, and two partially recognized states - the Republic of Kosovo and the Pridnestrovian Moldavian Republic. The statistical database covers a five-year period of 2010-2014. This dataset is created to enhance our understanding of the contemporary coastalization dynamics in Europe. Despite the fact that coastal regions of European countries exhibit an extensive level of development and remain attractive to human settlement, industry localization, and investment flows their contribution to the socio-economic development of Europe is unclear. The reported data cover a series of macroeconomic data on key indicators traditionally used in comparative analysis of regional development: average annual population, gross regional product (GRP) in purchasing power parity (PPP), labor productivity, population density and GRP (PPP) values per sq.km. Accounting for differences in geoeconomic position of the European regions enables to distinguish four subtypes of regions with a particular emphasis on the coastal area: coastal border, coastal other, coastal hinterland, and inland other. An additional focus is made on differentiating the performance indicators of regions depending on their border geo-economic position: border regions with a state border over land, lake or river surface, and midland regions - other non-border regions. This data is to be used as a comparative benchmark for the coastal border subgroup of regions against the totality of border and midland regions.
RESUMO
Human herpesvirus 6 (HHV-6) species have a unique ability to integrate into chromosomal telomeres. Mendelian inheritance via gametocyte integration results in HHV-6 in every nucleated cell. The epidemiology and clinical effect of inherited chromosomally integrated HHV-6 (iciHHV-6) in hematopoietic cell transplant (HCT) recipients is unclear. We identified 4319 HCT donor-recipient pairs (8638 subjects) who received an allogeneic HCT and had archived pre-HCT peripheral blood mononuclear cell samples. We screened these samples for iciHHV-6 and compared characteristics of HCT recipients and donors with iciHHV-6 with those of recipients and donors without iciHHV-6, respectively. We calculated Kaplan-Meier probability estimates and Cox proportional hazards models for post-HCT outcomes based on recipient and donor iciHHV-6 status. We identified 60 HCT recipients (1.4%) and 40 donors (0.9%) with iciHHV-6; both recipient and donor harbored iciHHV-6 in 13 HCTs. Thus, there were 87 HCTs (2%) in which the recipient, donor, or both harbored iciHHV-6. Acute graft-versus-host disease (GVHD) grades 2-4 was more frequent when recipients or donors had iciHHV-6 (adjusted hazard ratios, 1.7-1.9; P = .004-.001). Cytomegalovirus viremia (any and high-level) was more frequent among recipients with iciHHV-6 (adjusted HRs, 1.7-3.1; P = .001-.040). Inherited ciHHV-6 status did not significantly affect risk for chronic GVHD, hematopoietic cell engraftment, overall mortality, or nonrelapse mortality. Screening for iciHHV-6 could guide donor selection and post-HCT risk stratification and treatment. Further study is needed to replicate these findings and identify potential mechanisms.
Assuntos
Cromossomos Humanos/genética , Cromossomos Humanos/virologia , Transplante de Células-Tronco Hematopoéticas , Herpesvirus Humano 6/genética , Padrões de Herança/genética , Doadores de Tecidos , Doença Aguda , Adulto , Doença Crônica , Feminino , Doença Enxerto-Hospedeiro/genética , Humanos , Incidência , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Análise Multivariada , Probabilidade , Modelos de Riscos Proporcionais , Fatores de Risco , Resultado do TratamentoRESUMO
BACKGROUND: Herpes simplex virus type 1 (HSV-1) is prevalent worldwide and causes mucocutaneous infections of the oral area. We aimed to define the frequency and anatomic distribution of HSV-1 reactivation in the facial area in persons with a history of oral herpes. METHODS: Eight immunocompetent HSV-1 seropositive adults were evaluated for shedding of HSV-1 from 12 separate orofacial sites (8 from oral mucosa, 2 from nose, and 2 from conjunctiva) 5 days a week and from the oral cavity 7 days a week for approximately 5 consecutive weeks by a HSV DNA PCR assay. Symptoms and lesions were recorded by participants. RESULTS: Herpes simplex virus type 1 was detected at least from 1 site on 77 (26.5%) of 291 days. The most frequent site of shedding was the oral mucosa, with widespread shedding throughout the oral cavity. Lesional shedding rate was 36.4% (4 of 11 days with lesions), and the asymptomatic rate was 27.1% (65 of 240 nonlesional days). In individual participants, the median rate of HSV shedding by HSV PCR was 19.7% of days (range, 11%-63%). CONCLUSIONS: Reactivation of HSV-1 on the oral mucosa is common and usually asymptomatic. However, HSV-1 is rarely found in tears and nasal mucosa. Frequent oral shedding of HSV-1 may increase the risk for transmitting the virus to both oral and genital mucosa of sexual partners.