Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 63
Filtrar
1.
medRxiv ; 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38699369

RESUMEN

Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.

2.
Cell Genom ; 4(4): 100526, 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38537633

RESUMEN

Hispanic/Latino children have the highest risk of acute lymphoblastic leukemia (ALL) in the US compared to other racial/ethnic groups, yet the basis of this remains incompletely understood. Through genetic fine-mapping analyses, we identified a new independent childhood ALL risk signal near IKZF1 in self-reported Hispanic/Latino individuals, but not in non-Hispanic White individuals, with an effect size of ∼1.44 (95% confidence interval = 1.33-1.55) and a risk allele frequency of ∼18% in Hispanic/Latino populations and <0.5% in European populations. This risk allele was positively associated with Indigenous American ancestry, showed evidence of selection in human history, and was associated with reduced IKZF1 expression. We identified a putative causal variant in a downstream enhancer that is most active in pro-B cells and interacts with the IKZF1 promoter. This variant disrupts IKZF1 autoregulation at this enhancer and results in reduced enhancer activity in B cell progenitors. Our study reveals a genetic basis for the increased ALL risk in Hispanic/Latino children.


Asunto(s)
Predisposición Genética a la Enfermedad , Leucemia-Linfoma Linfoblástico de Células Precursoras , Humanos , Niño , Predisposición Genética a la Enfermedad/genética , Polimorfismo de Nucleótido Simple , Factores de Transcripción/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Hispánicos o Latinos/genética , Factor de Transcripción Ikaros/genética
3.
Arthritis Res Ther ; 26(1): 47, 2024 02 10.
Artículo en Inglés | MEDLINE | ID: mdl-38336809

RESUMEN

BACKGROUND: Juvenile idiopathic arthritis (JIA) is one of the most prevalent rheumatic disorders in children and is classified as an autoimmune disease (AID). While a robust genetic contribution to JIA etiology has been established, the exact pathogenesis remains unclear. METHODS: To prioritize biologically interpretable susceptibility genes and proteins for JIA, we conducted transcriptome-wide and proteome-wide association studies (TWAS/PWAS). Then, to understand the genetic architecture of JIA, we systematically analyzed single-nucleotide polymorphism (SNP)-based heritability, a signature of natural selection, and polygenicity. Next, we conducted HLA typing using multi-ethnicity RNA sequencing data. Additionally, we examined the T cell receptor (TCR) repertoire at a single-cell level to explore the potential links between immunity and JIA risk. RESULTS: We have identified 19 TWAS genes and two PWAS proteins associated with JIA risks. Furthermore, we observe that the heritability and cell type enrichment analysis of JIA are enriched in T lymphocytes and HLA regions and that JIA shows higher polygenicity compared to other AIDs. In multi-ancestry HLA typing, B*45:01 is more prevalent in African JIA patients than in European JIA patients, whereas DQA1*01:01, DQA1*03:01, and DRB1*04:01 exhibit a higher frequency in European JIA patients. Using single-cell immune repertoire analysis, we identify clonally expanded T cell subpopulations in JIA patients, including CXCL13+BHLHE40+ TH cells which are significantly associated with JIA risks. CONCLUSION: Our findings shed new light on the pathogenesis of JIA and provide a strong foundation for future mechanistic studies aimed at uncovering the molecular drivers of JIA.


Asunto(s)
Artritis Juvenil , Niño , Humanos , Artritis Juvenil/genética , Predisposición Genética a la Enfermedad/genética , Proteínas/genética , Alelos
4.
Hum Mol Genet ; 33(8): 687-697, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38263910

RESUMEN

BACKGROUND: Expansion of genome-wide association studies across population groups is needed to improve our understanding of shared and unique genetic contributions to breast cancer. We performed association and replication studies guided by a priori linkage findings from African ancestry (AA) relative pairs. METHODS: We performed fixed-effect inverse-variance weighted meta-analysis under three significant AA breast cancer linkage peaks (3q26-27, 12q22-23, and 16q21-22) in 9241 AA cases and 10 193 AA controls. We examined associations with overall breast cancer as well as estrogen receptor (ER)-positive and negative subtypes (193,132 SNPs). We replicated associations in the African-ancestry Breast Cancer Genetic Consortium (AABCG). RESULTS: In AA women, we identified two associations on chr12q for overall breast cancer (rs1420647, OR = 1.15, p = 2.50×10-6; rs12322371, OR = 1.14, p = 3.15×10-6), and one for ER-negative breast cancer (rs77006600, OR = 1.67, p = 3.51×10-6). On chr3, we identified two associations with ER-negative disease (rs184090918, OR = 3.70, p = 1.23×10-5; rs76959804, OR = 3.57, p = 1.77×10-5) and on chr16q we identified an association with ER-negative disease (rs34147411, OR = 1.62, p = 8.82×10-6). In the replication study, the chr3 associations were significant and effect sizes were larger (rs184090918, OR: 6.66, 95% CI: 1.43, 31.01; rs76959804, OR: 5.24, 95% CI: 1.70, 16.16). CONCLUSION: The two chr3 SNPs are upstream to open chromatin ENSR00000710716, a regulatory feature that is actively regulated in mammary tissues, providing evidence that variants in this chr3 region may have a regulatory role in our target organ. Our study provides support for breast cancer variant discovery using prioritization based on linkage evidence.


Asunto(s)
Población Negra , Neoplasias de la Mama , Predisposición Genética a la Enfermedad , Femenino , Humanos , Población Negra/genética , Neoplasias de la Mama/genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple
5.
Nat Commun ; 15(1): 522, 2024 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-38225224

RESUMEN

Expression Quantitative Trait Loci (eQTLs) are critical to understanding the mechanisms underlying disease-associated genomic loci. Nearly all protein-coding genes in the human genome have been associated with one or more eQTLs. Here we introduce a multi-variant generalization of allelic Fold Change (aFC), aFC-n, to enable quantification of the cis-regulatory effects in multi-eQTL genes under the assumption that all eQTLs are known and conditionally independent. Applying aFC-n to 458,465 eQTLs in the Genotype-Tissue Expression (GTEx) project data, we demonstrate significant improvements in accuracy over the original model in estimating the eQTL effect sizes and in predicting genetically regulated gene expression over the current tools. We characterize some of the empirical properties of the eQTL data and use this framework to assess the current state of eQTL data in terms of characterizing cis-regulatory landscape in individual genomes. Notably, we show that 77.4% of the genes with an allelic imbalance in a sample show 0.5 log2 fold or more of residual imbalance after accounting for the eQTL data underlining the remaining gap in characterizing regulatory landscape in individual genomes. We further contrast this gap across tissue types, and ancestry backgrounds to identify its correlates and guide future studies.


Asunto(s)
Genómica , Sitios de Carácter Cuantitativo , Humanos , Haplotipos , Sitios de Carácter Cuantitativo/genética , Alelos , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Perfilación de la Expresión Génica
6.
bioRxiv ; 2024 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-38293199

RESUMEN

Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.

7.
Hum Mol Genet ; 33(2): 170-181, 2024 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-37824084

RESUMEN

Stroke, characterized by sudden neurological deficits, is the second leading cause of death worldwide. Although genome-wide association studies (GWAS) have successfully identified many genomic regions associated with ischemic stroke (IS), the genes underlying risk and their regulatory mechanisms remain elusive. Here, we integrate a large-scale GWAS (N = 1 296 908) for IS together with molecular QTLs data, including mRNA, splicing, enhancer RNA (eRNA), and protein expression data from up to 50 tissues (total N = 11 588). We identify 136 genes/eRNA/proteins associated with IS risk across 60 independent genomic regions and find IS risk is most enriched for eQTLs in arterial and brain-related tissues. Focusing on IS-relevant tissues, we prioritize 9 genes/proteins using probabilistic fine-mapping TWAS analyses. In addition, we discover that blood cell traits, particularly reticulocyte cells, have shared genetic contributions with IS using TWAS-based pheWAS and genetic correlation analysis. Lastly, we integrate our findings with a large-scale pharmacological database and identify a secondary bile acid, deoxycholic acid, as a potential therapeutic component. Our work highlights IS risk genes/splicing-sites/enhancer activity/proteins with their phenotypic consequences using relevant tissues as well as identify potential therapeutic candidates for IS.


Asunto(s)
Accidente Cerebrovascular Isquémico , Transcriptoma , Humanos , Estudio de Asociación del Genoma Completo , Accidente Cerebrovascular Isquémico/genética , Genómica , Fenotipo , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple/genética
8.
Am J Hum Genet ; 110(12): 2077-2091, 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-38065072

RESUMEN

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.


Asunto(s)
Genética de Población , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , Mapeo Cromosómico/métodos , Modelos Genéticos , Fenotipo , Sitios de Carácter Cuantitativo/genética , Nativos de Hawái y Otras Islas del Pacífico/genética
9.
iScience ; 26(11): 108181, 2023 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-37953948

RESUMEN

Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] =9.2×10-82 vs. 1.4×10-33), while being ∼ 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data.

10.
bioRxiv ; 2023 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-37873208

RESUMEN

The demographic history of a population drives the pattern of genetic variation and is encoded in the gene-genealogical trees of the sampled alleles. However, existing methods to infer demographic history from genetic data tend to use relatively low-dimensional summaries of the genealogy, such as allele frequency spectra. As a step toward capturing more of the information encoded in the genome-wide sequence of genealogical trees, here we propose a novel framework called the genealogical likelihood (gLike), which derives the full likelihood of a genealogical tree under any hypothesized demographic history. Employing a graph-based structure, gLike summarizes across independent trees the relationships among all lineages in a tree with all possible trajectories of population memberships through time and efficiently computes the exact marginal probability under a parameterized demographic model. Through extensive simulations and empirical applications on populations that have experienced multiple admixtures, we showed that gLike can accurately estimate dozens of demographic parameters when the true genealogy is known, including ancestral population sizes, admixture timing, and admixture proportions. Moreover, when using genealogical trees inferred from genetic data, we showed that gLike outperformed conventional demographic inference methods that leverage only the allele-frequency spectrum and yielded parameter estimates that align with established historical knowledge of the past demographic histories for populations like Latino Americans and Native Hawaiians. Furthermore, our framework can trace ancestral histories by analyzing a sample from the admixed population without proxies for its source populations, removing the need to sample ancestral populations that may no longer exist. Taken together, our proposed gLike framework harnesses underutilized genealogical information to offer exceptional sensitivity and accuracy in inferring complex demographies for humans and other species, particularly as estimation of genome-wide genealogies improves.

11.
Am J Hum Genet ; 110(11): 1853-1862, 2023 11 02.
Artículo en Inglés | MEDLINE | ID: mdl-37875120

RESUMEN

The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ∼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2 = 0.012 ± 9.2 × 10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.


Asunto(s)
Negro o Afroamericano , Genética de Población , Humanos , Mapeo Cromosómico , Fenotipo , Polimorfismo de Nucleótido Simple/genética
12.
Am J Hum Genet ; 110(11): 1863-1874, 2023 11 02.
Artículo en Inglés | MEDLINE | ID: mdl-37879338

RESUMEN

Genome-wide association studies (GWASs) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra-large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N = 420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (p = 2.58E-10) and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest shared etiologies between rheumatoid arthritis and periodontal condition in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWASs.


Asunto(s)
Artritis Reumatoide , Estudio de Asociación del Genoma Completo , Masculino , Humanos , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial , Fenotipo , Encéfalo , Artritis Reumatoide/genética , Polimorfismo de Nucleótido Simple/genética , Pleiotropía Genética
13.
bioRxiv ; 2023 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-37131817

RESUMEN

The heritability explained by local ancestry markers in an admixed population hγ2 provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present a novel approach, Heritability estimation from Admixture Mapping Summary STAtistics (HAMSTA), which uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ~5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2=0.012+/-9.2×10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2=0.30+/-0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 +/- 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.

14.
Genome Res ; 33(4): 511-524, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37037626

RESUMEN

Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition. Here, we use a repeat-pangenome graph (RPGG) constructed on 35 haplotype-resolved assemblies to detect variation in both VNTR length and repeat composition. We align population-scale data from the Genotype-Tissue Expression (GTEx) Consortium to examine how variations in sequence composition may be linked to expression, including cases independent of overall VNTR length. We find that 9422 out of 39,125 VNTRs are associated with nearby gene expression through motif variations, of which only 23.4% are accessible from length. Fine-mapping identifies 174 genes to be likely driven by variation in certain VNTR motifs and not overall length. We highlight two genes, CACNA1C and RNF213, that have expression associated with motif variation, showing the utility of RPGG analysis as a new approach for trait association in multiallelic and highly variable loci.


Asunto(s)
Adenosina Trifosfatasas , Repeticiones de Minisatélite , Humanos , Repeticiones de Minisatélite/genética , Fenotipo , Haplotipos , Expresión Génica , Adenosina Trifosfatasas/genética , Ubiquitina-Proteína Ligasas/genética
15.
bioRxiv ; 2023 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37066144

RESUMEN

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide Association Studies (GWAS) are a powerful way to find genetic loci associated with phenotypes. GWAS are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix given the ARG (local eGRM). Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to identify a large-effect BMI locus, the CREBRF gene, in a sample of Native Hawaiians in which it was not previously detectable by GWAS because of a lack of population-specific imputation resources. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.

16.
medRxiv ; 2023 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-37034585

RESUMEN

Stroke, characterized by sudden neurological deficits, is the second leading cause of death worldwide. Although genome-wide association studies (GWAS) have successfully identified many genomic regions associated with ischemic stroke (IS), the genes underlying risk and their regulatory mechanisms remain elusive. Here, we integrate a large-scale GWAS (N=1,296,908) for IS together with mRNA, splicing, enhancer RNA (eRNA) and protein expression data (N=11,588) from 50 tissues. We identify 136 genes/eRNA/proteins associated with IS risk across 54 independent genomic regions and find IS risk is most enriched for eQTLs in arterial and brain-related tissues. Focusing on IS-relevant tissues, we prioritize 9 genes/proteins using probabilistic fine-mapping TWAS analyses. In addition, we discover that blood cell traits, particularly reticulocyte cells, have shared genetic contributions with IS using TWAS-based pheWAS and genetic correlation analysis. Lastly, we integrate our findings with a large-scale pharmacological database and identify a secondary bile acid, deoxycholic acid, as a potential therapeutic component. Our work highlights IS risk genes/splicing-sites/enhancer activity/proteins with their phenotypic consequences using relevant tissues as well as identify potential therapeutic candidates for IS.

17.
medRxiv ; 2023 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-37034739

RESUMEN

Genome-wide association studies (GWAS) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes, while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N=420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (P=2.58E-10), and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest novel shared etiologies between rheumatoid arthritis and periodontal condition, in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWAS.

18.
Bioinformatics ; 39(5)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37099718

RESUMEN

SUMMARY: Genome-wide association studies (GWASs) have identified numerous genetic variants associated with complex disease risk; however, most of these associations are non-coding, complicating identifying their proximal target gene. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad hoc simulations to demonstrate feasibility. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods. AVAILABILITY AND IMPLEMENTATION: Software and documentation are available at https://github.com/mancusolab/twas_sim.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Humanos , Estudio de Asociación del Genoma Completo/métodos , Perfilación de la Expresión Génica , Simulación por Computador , Programas Informáticos , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad
19.
J Natl Cancer Inst ; 115(6): 712-732, 2023 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-36929942

RESUMEN

BACKGROUND: The shared inherited genetic contribution to risk of different cancers is not fully known. In this study, we leverage results from 12 cancer genome-wide association studies (GWAS) to quantify pairwise genome-wide genetic correlations across cancers and identify novel cancer susceptibility loci. METHODS: We collected GWAS summary statistics for 12 solid cancers based on 376 759 participants with cancer and 532 864 participants without cancer of European ancestry. The included cancer types were breast, colorectal, endometrial, esophageal, glioma, head and neck, lung, melanoma, ovarian, pancreatic, prostate, and renal cancers. We conducted cross-cancer GWAS and transcriptome-wide association studies to discover novel cancer susceptibility loci. Finally, we assessed the extent of variant-specific pleiotropy among cancers at known and newly identified cancer susceptibility loci. RESULTS: We observed widespread but modest genome-wide genetic correlations across cancers. In cross-cancer GWAS and transcriptome-wide association studies, we identified 15 novel cancer susceptibility loci. Additionally, we identified multiple variants at 77 distinct loci with strong evidence of being associated with at least 2 cancer types by testing for pleiotropy at known cancer susceptibility loci. CONCLUSIONS: Overall, these results suggest that some genetic risk variants are shared among cancers, though much of cancer heritability is cancer-specific and thus tissue-specific. The increase in statistical power associated with larger sample sizes in cross-disease analysis allows for the identification of novel susceptibility regions. Future studies incorporating data on multiple cancer types are likely to identify additional regions associated with the risk of multiple cancer types.


Asunto(s)
Estudio de Asociación del Genoma Completo , Neoplasias , Masculino , Humanos , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Neoplasias/genética , Factores de Riesgo , Transcriptoma , Polimorfismo de Nucleótido Simple
20.
Clin Epigenetics ; 14(1): 158, 2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36457128

RESUMEN

BACKGROUND: Epigenome-wide association studies (EWAS) have helped to define the associations between DNA methylation and many clinicopathologic and developmental traits. Since DNA methylation is affected by genetic variation at certain loci, EWAS associations may be potentially influenced by genetic effects. However, a formal assessment of the value of incorporating genetic variation in EWAS evaluations is lacking especially for multiethnic populations. METHODS: Using single nucleotide polymorphism (SNP) from Illumina Omni Express or Affymetrix PMDA arrays and DNA methylation data from the Illumina 450 K or EPIC array from 1638 newborns of diverse genetic ancestries, we generated DNA methylation quantitative trait loci (mQTL) databases for both array types. We then investigated associations between neonatal DNA methylation and birthweight (incorporating gestational age) using EWAS modeling, and reported how EWAS results were influenced by controlling for mQTLs. RESULTS: For CpGs on the 450 K array, an average of 15.4% CpGs were assigned as mQTLs, while on the EPIC array, 23.0% CpGs were matched to mQTLs (adjusted P value < 0.05). The CpGs associated with SNPs were enriched in the CpG island shore regions. Correcting for mQTLs in the EWAS model for birthweight helped to increase significance levels for top hits. For CpGs overlapping genes associated with birthweight-related pathways (nutrition metabolism, biosynthesis, for example), accounting for mQTLs changed their regression coefficients more dramatically (> 20%) than for other random CpGs. CONCLUSION: DNA methylation levels at circa 20% CpGs in the genome were affected by common SNP genotypes. EWAS model fit significantly improved when taking these genetic effects into consideration. Genetic effects were stronger on CpGs overlapping genetic elements associated with control of gene expression.


Asunto(s)
Epigenoma , Sitios de Carácter Cuantitativo , Recién Nacido , Humanos , Metilación de ADN , Peso al Nacer/genética , Islas de CpG
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA