Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 77
Filtrar
1.
Cell ; 184(10): 2633-2648.e19, 2021 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-33864768

RESUMEN

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.


Asunto(s)
Enfermedad/genética , Herencia Multifactorial/genética , Población/genética , ARN Largo no Codificante/genética , Transcriptoma , Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Perfilación de la Expresión Génica , Variación Genética , Humanos , Enfermedades Inflamatorias del Intestino/genética , Especificidad de Órganos/genética , Sitios de Carácter Cuantitativo
2.
Am J Hum Genet ; 111(6): 1100-1113, 2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38733992

RESUMEN

Splicing-based transcriptome-wide association studies (splicing-TWASs) of breast cancer have the potential to identify susceptibility genes. However, existing splicing-TWASs test the association of individual excised introns in breast tissue only and thus have limited power to detect susceptibility genes. In this study, we performed a multi-tissue joint splicing-TWAS that integrated splicing-TWAS signals of multiple excised introns in each gene across 11 tissues that are potentially relevant to breast cancer risk. We utilized summary statistics from a meta-analysis that combined genome-wide association study (GWAS) results of 424,650 women of European ancestry. Splicing-level prediction models were trained in GTEx (v.8) data. We identified 240 genes by the multi-tissue joint splicing-TWAS at the Bonferroni-corrected significance level; in the tissue-specific splicing-TWAS that combined TWAS signals of excised introns in genes in breast tissue only, we identified nine additional significant genes. Of these 249 genes, 88 genes in 62 loci have not been reported by previous TWASs, and 17 genes in seven loci are at least 1 Mb away from published GWAS index variants. By comparing the results of our splicing-TWASs with previous gene-expression-based TWASs that used the same summary statistics and expression prediction models trained in the same reference panel, we found that 110 genes in 70 loci that are identified only by the splicing-TWASs. Our results showed that for many genes, expression quantitative trait loci (eQTL) did not show a significant impact on breast cancer risk, whereas splicing quantitative trait loci (sQTL) showed a strong impact through intron excision events.


Asunto(s)
Neoplasias de la Mama , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Empalme del ARN , Transcriptoma , Humanos , Neoplasias de la Mama/genética , Femenino , Empalme del ARN/genética , Intrones/genética , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Perfilación de la Expresión Génica
3.
Am J Hum Genet ; 111(3): 445-455, 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38320554

RESUMEN

Regulation of transcription and translation are mechanisms through which genetic variants affect complex traits. Expression quantitative trait locus (eQTL) studies have been more successful at identifying cis-eQTL (within 1 Mb of the transcription start site) than trans-eQTL. Here, we tested the cis component of gene expression for association with observed plasma protein levels to identify cis- and trans-acting genes that regulate protein levels. We used transcriptome prediction models from 49 Genotype-Tissue Expression (GTEx) Project tissues to predict the cis component of gene expression and tested the predicted expression of every gene in every tissue for association with the observed abundance of 3,622 plasma proteins measured in 3,301 individuals from the INTERVAL study. We tested significant results for replication in 971 individuals from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA). We found 1,168 and 1,210 cis- and trans-acting associations that replicated in TOPMed (FDR < 0.05) with a median expected true positive rate (π1) across tissues of 0.806 and 0.390, respectively. The target proteins of trans-acting genes were enriched for transcription factor binding sites and autoimmune diseases in the GWAS catalog. Furthermore, we found a higher correlation between predicted expression and protein levels of the same underlying gene (R = 0.17) than observed expression (R = 0.10, p = 7.50 × 10-11). This indicates the cis-acting genetically regulated (heritable) component of gene expression is more consistent across tissues than total observed expression (genetics + environment) and is useful in uncovering the function of SNPs associated with complex traits.


Asunto(s)
Proteoma , Transcriptoma , Humanos , Transcriptoma/genética , Proteoma/genética , Herencia Multifactorial , Sitios de Carácter Cuantitativo/genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética
4.
Am J Hum Genet ; 110(1): 44-57, 2023 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-36608684

RESUMEN

Integrative genetic association methods have shown great promise in post-GWAS (genome-wide association study) analyses, in which one of the most challenging tasks is identifying putative causal genes and uncovering molecular mechanisms of complex traits. Recent studies suggest that prevailing computational approaches, including transcriptome-wide association studies (TWASs) and colocalization analysis, are individually imperfect, but their joint usage can yield robust and powerful inference results. This paper presents INTACT, a computational framework to integrate probabilistic evidence from these distinct types of analyses and implicate putative causal genes. This procedure is flexible and can work with a wide range of existing integrative analysis approaches. It has the unique ability to quantify the uncertainty of implicated genes, enabling rigorous control of false-positive discoveries. Taking advantage of this highly desirable feature, we further propose an efficient algorithm, INTACT-GSE, for gene set enrichment analysis based on the integrated probabilistic evidence. We examine the proposed computational methods and illustrate their improved performance over the existing approaches through simulation studies. We apply the proposed methods to analyze the multi-tissue eQTL data from the GTEx project and eight large-scale complex- and molecular-trait GWAS datasets from multiple consortia and the UK Biobank. Overall, we find that the proposed methods markedly improve the existing putative gene implication methods and are particularly advantageous in evaluating and identifying key gene sets and biological pathways underlying complex traits.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Humanos , Transcriptoma/genética , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo/genética , Simulación por Computador , Polimorfismo de Nucleótido Simple/genética , Predisposición Genética a la Enfermedad
5.
Am J Hum Genet ; 110(6): 950-962, 2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37164006

RESUMEN

Genome-wide association studies (GWASs) have identified more than 200 genomic loci for breast cancer risk, but specific causal genes in most of these loci have not been identified. In fact, transcriptome-wide association studies (TWASs) of breast cancer performed using gene expression prediction models trained in breast tissue have yet to clearly identify most target genes. To identify candidate genes, we performed a GWAS analysis in a breast cancer dataset from UK Biobank (UKB) and combined the results with the GWAS results of the Breast Cancer Association Consortium (BCAC) by a meta-analysis. Using the summary statistics from the meta-analysis, we performed a joint TWAS analysis that combined TWAS signals from multiple tissues. We used expression prediction models trained in 11 tissues that are potentially relevant to breast cancer from the Genotype-Tissue Expression (GTEx) data. In the GWAS analysis, we identified eight loci distinct from those reported previously. In the TWAS analysis, we identified 309 genes at 108 genomic loci to be significantly associated with breast cancer at the Bonferroni threshold. Of these, 17 genes were located in eight regions that were at least 1 Mb away from published GWAS hits. The remaining TWAS-significant genes were located in 100 known genomic loci from previous GWASs of breast cancer. We found that 21 genes located in known GWAS loci remained statistically significant after conditioning on previous GWAS index variants. Our study provides insights into breast cancer genetics through mapping candidate target genes in a large proportion of known GWAS loci and discovering multiple new loci.


Asunto(s)
Neoplasias de la Mama , Transcriptoma , Humanos , Femenino , Transcriptoma/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Neoplasias de la Mama/genética , Sitios de Carácter Cuantitativo/genética , Polimorfismo de Nucleótido Simple/genética
6.
Am J Hum Genet ; 109(5): 857-870, 2022 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-35385699

RESUMEN

While polygenic risk scores (PRSs) enable early identification of genetic risk for chronic obstructive pulmonary disease (COPD), predictive performance is limited when the discovery and target populations are not well matched. Hypothesizing that the biological mechanisms of disease are shared across ancestry groups, we introduce a PrediXcan-derived polygenic transcriptome risk score (PTRS) to improve cross-ethnic portability of risk prediction. We constructed the PTRS using summary statistics from application of PrediXcan on large-scale GWASs of lung function (forced expiratory volume in 1 s [FEV1] and its ratio to forced vital capacity [FEV1/FVC]) in the UK Biobank. We examined prediction performance and cross-ethnic portability of PTRS through smoking-stratified analyses both on 29,381 multi-ethnic participants from TOPMed population/family-based cohorts and on 11,771 multi-ethnic participants from TOPMed COPD-enriched studies. Analyses were carried out for two dichotomous COPD traits (moderate-to-severe and severe COPD) and two quantitative lung function traits (FEV1 and FEV1/FVC). While the proposed PTRS showed weaker associations with disease than PRS for European ancestry, the PTRS showed stronger association with COPD than PRS for African Americans (e.g., odds ratio [OR] = 1.24 [95% confidence interval [CI]: 1.08-1.43] for PTRS versus 1.10 [0.96-1.26] for PRS among heavy smokers with ≥ 40 pack-years of smoking) for moderate-to-severe COPD. Cross-ethnic portability of the PTRS was significantly higher than the PRS (paired t test p < 2.2 × 10-16 with portability gains ranging from 5% to 28%) for both dichotomous COPD traits and across all smoking strata. Our study demonstrates the value of PTRS for improved cross-ethnic portability compared to PRS in predicting COPD risk.


Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica , Transcriptoma , Humanos , Pulmón , National Heart, Lung, and Blood Institute (U.S.) , Enfermedad Pulmonar Obstructiva Crónica/genética , Factores de Riesgo , Estados Unidos/epidemiología
7.
Am J Hum Genet ; 108(1): 25-35, 2021 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-33308443

RESUMEN

Colocalization analysis has emerged as a powerful tool to uncover the overlapping of causal variants responsible for both molecular and complex disease phenotypes. The findings from colocalization analysis yield insights into the molecular pathways of complex diseases. In this paper, we conduct an in-depth investigation of the promise and limitations of the available colocalization analysis approaches. Focusing on variant-level colocalization approaches, we first establish the connections between various existing methods. We proceed to discuss the impacts of various controllable analytical factors and uncontrollable practical factors on outcomes of colocalization analysis through realistic simulations and real data examples. We identify a single analytical factor, the specification of prior enrichment levels, which can lead to severe inflation of false-positive colocalization findings. Meanwhile, the combination of many other analytical and practical factors all lead to diminished power. Consequently, we recommend the following strategies for the best practice of colocalization analysis: (1) estimating prior enrichment level from the observed data and (2) separating fine-mapping and colocalization analysis. Our analysis of 4,091 complex traits and the multi-tissue expression quantitative trait loci (eQTL) data from the GTEx (v.8) suggests that colocalizations of molecular QTLs and causal complex trait associations are widespread. However, only a small proportion can be confidently identified from currently available data due to a lack of power. Our findings set a benchmark for current and future integrative genetic association analysis applications.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Desequilibrio de Ligamiento/genética , Fenotipo
8.
Mol Syst Biol ; 19(8): e11407, 2023 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-37232043

RESUMEN

How do aberrations in widely expressed genes lead to tissue-selective hereditary diseases? Previous attempts to answer this question were limited to testing a few candidate mechanisms. To answer this question at a larger scale, we developed "Tissue Risk Assessment of Causality by Expression" (TRACE), a machine learning approach to predict genes that underlie tissue-selective diseases and selectivity-related features. TRACE utilized 4,744 biologically interpretable tissue-specific gene features that were inferred from heterogeneous omics datasets. Application of TRACE to 1,031 disease genes uncovered known and novel selectivity-related features, the most common of which was previously overlooked. Next, we created a catalog of tissue-associated risks for 18,927 protein-coding genes (https://netbio.bgu.ac.il/trace/). As proof-of-concept, we prioritized candidate disease genes identified in 48 rare-disease patients. TRACE ranked the verified disease gene among the patient's candidate genes significantly better than gene prioritization methods that rank by gene constraint or tissue expression. Thus, tissue selectivity combined with machine learning enhances genetic and clinical understanding of hereditary diseases.


Asunto(s)
Aprendizaje Automático , Enfermedades Raras , Humanos , Enfermedades Raras/genética , Medición de Riesgo , Causalidad
9.
PLoS Genet ; 15(1): e1007889, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30668570

RESUMEN

Integration of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is needed to improve our understanding of the biological mechanisms underlying GWAS hits, and our ability to identify therapeutic targets. Gene-level association methods such as PrediXcan can prioritize candidate targets. However, limited eQTL sample sizes and absence of relevant developmental and disease context restrict our ability to detect associations. Here we propose an efficient statistical method (MultiXcan) that leverages the substantial sharing of eQTLs across tissues and contexts to improve our ability to identify potential target genes. MultiXcan integrates evidence across multiple panels using multivariate regression, which naturally takes into account the correlation structure. We apply our method to simulated and real traits from the UK Biobank and show that, in realistic settings, we can detect a larger set of significantly associated genes than using each panel separately. To improve applicability, we developed a summary result-based extension called S-MultiXcan, which we show yields highly concordant results with the individual level version when LD is well matched. Our multivariate model-based approach allowed us to use the individual level results as a gold standard to calibrate and develop a robust implementation of the summary-based extension. Results from our analysis as well as software and necessary resources to apply our method are publicly available.


Asunto(s)
Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Sitios de Carácter Cuantitativo/genética , Transcriptoma/genética , Expresión Génica/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos/estadística & datos numéricos
10.
Hum Mol Genet ; 28(7): 1212-1224, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30624610

RESUMEN

Interpretation of genetic association results is difficult because signals often lack biological context. To generate hypotheses of the functional genetic etiology of complex cardiometabolic traits, we estimated the genetically determined component of gene expression from common variants using PrediXcan (1) and determined genes with differential predicted expression by trait. PrediXcan imputes tissue-specific expression levels from genetic variation using variant-level effect on gene expression in transcriptome data. To explore the value of imputed genetically regulated gene expression (GReX) models across different ancestral populations, we evaluated imputed expression levels for predictive accuracy genome-wide in RNA sequence data in samples drawn from European-ancestry and African-ancestry populations and identified substantial predictive power using European-derived models in a non-European target population. We then tested the association of GReX on 15 cardiometabolic traits including blood lipid levels, body mass index, height, blood pressure, fasting glucose and insulin, RR interval, fibrinogen level, factor VII level and white blood cell and platelet counts in 15 755 individuals across three ancestry groups, resulting in 20 novel gene-phenotype associations reaching experiment-wide significance across ancestries. In addition, we identified 18 significant novel gene-phenotype associations in our ancestry-specific analyses. Top associations were assessed for additional support via query of S-PrediXcan (2) results derived from publicly available genome-wide association studies summary data. Collectively, these findings illustrate the utility of transcriptome-based imputation models for discovery of cardiometabolic effect genes in a diverse dataset.


Asunto(s)
Predicción/métodos , Metaboloma/genética , Metaboloma/fisiología , Adulto , Anciano , Presión Sanguínea , Índice de Masa Corporal , Mapeo Cromosómico/métodos , Etnicidad/genética , Femenino , Estudios de Asociación Genética/métodos , Estudio de Asociación del Genoma Completo/métodos , Humanos , Masculino , Persona de Mediana Edad , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Transcriptoma/genética , Población Blanca/genética
11.
PLoS Genet ; 14(8): e1007586, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-30096133

RESUMEN

For many complex traits, gene regulation is likely to play a crucial mechanistic role. How the genetic architectures of complex traits vary between populations and subsequent effects on genetic prediction are not well understood, in part due to the historical paucity of GWAS in populations of non-European ancestry. We used data from the MESA (Multi-Ethnic Study of Atherosclerosis) cohort to characterize the genetic architecture of gene expression within and between diverse populations. Genotype and monocyte gene expression were available in individuals with African American (AFA, n = 233), Hispanic (HIS, n = 352), and European (CAU, n = 578) ancestry. We performed expression quantitative trait loci (eQTL) mapping in each population and show genetic correlation of gene expression depends on shared ancestry proportions. Using elastic net modeling with cross validation to optimize genotypic predictors of gene expression in each population, we show the genetic architecture of gene expression for most predictable genes is sparse. We found the best predicted gene in each population, TACSTD2 in AFA and CHURC1 in CAU and HIS, had similar prediction performance across populations with R2 > 0.8 in each population. However, we identified a subset of genes that are well-predicted in one population, but poorly predicted in another. We show these differences in predictive performance are due to allele frequency differences between populations. Using genotype weights trained in MESA to predict gene expression in independent populations showed that a training set with ancestry similar to the test set is better at predicting gene expression in test populations, demonstrating an urgent need for diverse population sampling in genomics. Our predictive models and performance statistics in diverse cohorts are made publicly available for use in transcriptome mapping methods at https://github.com/WheelerLab/DivPop.


Asunto(s)
Etnicidad/genética , Regulación de la Expresión Génica , Genética de Población , Negro o Afroamericano/genética , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/metabolismo , Moléculas de Adhesión Celular/genética , Moléculas de Adhesión Celular/metabolismo , Mapeo Cromosómico , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Genómica , Técnicas de Genotipaje , Hispánicos o Latinos/genética , Humanos , Modelos Genéticos , Herencia Multifactorial , Fenotipo , Sitios de Carácter Cuantitativo , Transcriptoma , Población Blanca/genética
12.
Genet Epidemiol ; 43(6): 596-608, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-30950127

RESUMEN

Regulation of gene expression is an important mechanism through which genetic variation can affect complex traits. A substantial portion of gene expression variation can be explained by both local (cis) and distal (trans) genetic variation. Much progress has been made in uncovering cis-acting expression quantitative trait loci (cis-eQTL), but trans-eQTL have been more difficult to identify and replicate. Here we take advantage of our ability to predict the cis component of gene expression coupled with gene mapping methods such as PrediXcan to identify high confidence candidate trans-acting genes and their targets. That is, we correlate the cis component of gene expression with observed expression of genes in different chromosomes. Leveraging the shared cis-acting regulation across tissues, we combine the evidence of association across all available Genotype-Tissue Expression Project tissues and find 2,356 trans-acting/target gene pairs with high mappability scores. Reassuringly, trans-acting genes are enriched in transcription and nucleic acid binding pathways and target genes are enriched in known transcription factor binding sites. Interestingly, trans-acting genes are more significantly associated with selected complex traits and diseases than target or background genes, consistent with percolating trans effects. Our scripts and summary statistics are publicly available for future studies of trans-acting gene regulation.


Asunto(s)
Enfermedades Cardiovasculares/genética , Regulación de la Expresión Génica , Estudios de Asociación Genética , Herencia Multifactorial , Sitios de Carácter Cuantitativo , Transactivadores/genética , Transcripción Genética , Mapeo Cromosómico , Genoma Humano , Humanos , Transcriptoma
13.
Bioinformatics ; 35(11): 1971-1973, 2019 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-30395166

RESUMEN

SUMMARY: Large biobanks, such as UK Biobank with half a million participants, are changing the scale and availability of genotypic and phenotypic data for researchers to ask fundamental questions about the biology of health and disease. The breadth of the UK Biobank data is enabling discoveries at an unprecedented pace. However, this size and complexity pose new challenges to investigators who need to keep the accruing data up to date, comply with potential consent changes, and efficiently and reproducibly extract subsets of the data to answer specific scientific questions. Here we propose a tool called ukbREST designed for the UK Biobank study (easily extensible to other biobanks), which allows authorized users to efficiently retrieve phenotypic and genetic data. It exposes a REST API that makes data highly accessible inside a private and secure network, allowing the data specification in a human readable text format easily shareable with other researchers. These characteristics make ukbREST an important tool to make biobank's valuable data more readily accessible to the research community and facilitate reproducibility of the analysis, a key aspect of science. AVAILABILITY AND IMPLEMENTATION: It is implemented in Python using the Flask-RESTful framework for the API, and it is under the MIT license. It works with PostgreSQL and a Docker image is available for easy deployment. The source code and documentation is available in Github: https://github.com/hakyimlab/ukbrest.


Asunto(s)
Programas Informáticos , Bancos de Muestras Biológicas , Documentación , Humanos , Reproducibilidad de los Resultados
14.
PLoS Genet ; 13(9): e1006727, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28957356

RESUMEN

Genome-wide association studies (GWAS) have identified more than 90 susceptibility loci for breast cancer, but the underlying biology of those associations needs to be further elucidated. More genetic factors for breast cancer are yet to be identified but sample size constraints preclude the identification of individual genetic variants with weak effects using traditional GWAS methods. To address this challenge, we utilized a gene-level expression-based method, implemented in the MetaXcan software, to predict gene expression levels for 11,536 genes using expression quantitative trait loci and examine the genetically-predicted expression of specific genes for association with overall breast cancer risk and estrogen receptor (ER)-negative breast cancer risk. Using GWAS datasets from a Challenge launched by National Cancer Institute, we identified TP53INP2 (tumor protein p53-inducible nuclear protein 2) at 20q11.22 to be significantly associated with ER-negative breast cancer (Z = -5.013, p = 5.35×10-7, Bonferroni threshold = 4.33×10-6). The association was consistent across four GWAS datasets, representing European, African and Asian ancestry populations. There are 6 single nucleotide polymorphisms (SNPs) included in the prediction of TP53INP2 expression and five of them were associated with estrogen-receptor negative breast cancer, although none of the SNP-level associations reached genome-wide significance. We conducted a replication study using a dataset outside of the Challenge, and found the association between TP53INP2 and ER-negative breast cancer was significant (p = 5.07x10-3). Expression of HP (16q22.2) showed a suggestive association with ER-negative breast cancer in the discovery phase (Z = 4.30, p = 1.70x10-5) although the association was not significant after Bonferroni adjustment. Of the 249 genes that are 250 kb within known breast cancer susceptibility loci identified from previous GWAS, 20 genes (8.0%) were statistically significant associated with ER-negative breast cancer (p<0.05), compared to 582 (5.2%) of 11,287 genes that are not close to previous GWAS loci. This study demonstrated that expression-based gene mapping is a promising approach for identifying cancer susceptibility genes.


Asunto(s)
Neoplasias de la Mama/genética , Receptor alfa de Estrógeno/genética , Haptoglobinas/genética , Proteínas Nucleares/genética , Neoplasias de la Mama/patología , Femenino , Regulación Neoplásica de la Expresión Génica , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple
15.
Am J Hum Genet ; 98(4): 697-708, 2016 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-27040689

RESUMEN

Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies.


Asunto(s)
Regulación de la Expresión Génica , Genotipo , Sitios de Carácter Cuantitativo , Transcriptoma , Humanos , Fenotipo , Proyectos Piloto , Reproducibilidad de los Resultados
16.
PLoS Genet ; 12(11): e1006423, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-27835642

RESUMEN

Understanding the genetic architecture of gene expression traits is key to elucidating the underlying mechanisms of complex traits. Here, for the first time, we perform a systematic survey of the heritability and the distribution of effect sizes across all representative tissues in the human body. We find that local h2 can be relatively well characterized with 59% of expressed genes showing significant h2 (FDR < 0.1) in the DGN whole blood cohort. However, current sample sizes (n ≤ 922) do not allow us to compute distal h2. Bayesian Sparse Linear Mixed Model (BSLMM) analysis provides strong evidence that the genetic contribution to local expression traits is dominated by a handful of genetic variants rather than by the collective contribution of a large number of variants each of modest size. In other words, the local architecture of gene expression traits is sparse rather than polygenic across all 40 tissues (from DGN and GTEx) examined. This result is confirmed by the sparsity of optimal performing gene expression predictors via elastic net modeling. To further explore the tissue context specificity, we decompose the expression traits into cross-tissue and tissue-specific components using a novel Orthogonal Tissue Decomposition (OTD) approach. Through a series of simulations we show that the cross-tissue and tissue-specific components are identifiable via OTD. Heritability and sparsity estimates of these derived expression phenotypes show similar characteristics to the original traits. Consistent properties relative to prior GTEx multi-tissue analysis results suggest that these traits reflect the expected biology. Finally, we apply this knowledge to develop prediction models of gene expression traits for all tissues. The prediction models, heritability, and prediction performance R2 for original and decomposed expression phenotypes are made publicly available (https://github.com/hakyimlab/PrediXcan).


Asunto(s)
Regulación de la Expresión Génica/genética , Modelos Genéticos , Especificidad de Órganos/genética , Carácter Cuantitativo Heredable , Teorema de Bayes , Genotipo , Humanos , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Tamaño de la Muestra
17.
PLoS Genet ; 11(1): e1004876, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25625282

RESUMEN

Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.


Asunto(s)
Glucemia/genética , Diabetes Mellitus Tipo 2/genética , Glucosa-6-Fosfatasa/genética , Insulina/sangre , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/patología , Exoma/genética , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Receptor del Péptido 1 Similar al Glucagón , Índice Glucémico/genética , Humanos , Insulina/genética , Polimorfismo de Nucleótido Simple , Receptores de Glucagón/genética
18.
Hum Genet ; 136(10): 1363-1373, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28836065

RESUMEN

Uterine fibroids are benign tumors of the uterus affecting up to 77% of women by menopause. They are the leading indication for hysterectomy, and account for $34 billion annually in the United States. Race/ethnicity and age are the strongest known risk factors. African American (AA) women have higher prevalence, earlier onset, and larger and more numerous fibroids than European American women. We conducted a multi-stage genome-wide association study (GWAS) of fibroid risk among AA women followed by in silico genetically predicted gene expression profiling of top hits. In Stage 1, cases and controls were confirmed by pelvic imaging, genotyped and imputed to 1000 Genomes. Stage 2 used self-reported fibroid and GWAS data from 23andMe, Inc. and the Black Women's Health Study. Associations with fibroid risk were modeled using logistic regression adjusted for principal components, followed by meta-analysis of results. We observed a significant association among 3399 AA cases and 4764 AA controls at rs739187 (risk-allele frequency = 0.27) in CYTH4 (OR (95% confidence interval) = 1.23 (1.16-1.30), p value = 7.82 × 10-9). Evaluation of the genetic association results with MetaXcan identified lower predicted gene expression of CYTH4 in thyroid tissue as significantly associated with fibroid risk (p value = 5.86 × 10-8). In this first multi-stage GWAS for fibroids among AA women, we identified a novel risk locus for fibroids within CYTH4 that impacts gene expression in thyroid and has potential biological relevance for fibroids.


Asunto(s)
Negro o Afroamericano/genética , Moléculas de Adhesión Celular , Regulación Neoplásica de la Expresión Génica , Frecuencia de los Genes , Factores de Intercambio de Guanina Nucleótido , Leiomioma , Proteínas de Neoplasias , Neoplasias Uterinas , Adulto , Alelos , Moléculas de Adhesión Celular/biosíntesis , Moléculas de Adhesión Celular/genética , Femenino , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Factores de Intercambio de Guanina Nucleótido/biosíntesis , Factores de Intercambio de Guanina Nucleótido/genética , Humanos , Leiomioma/genética , Leiomioma/metabolismo , Persona de Mediana Edad , Proteínas de Neoplasias/biosíntesis , Proteínas de Neoplasias/genética , Factores de Riesgo , Neoplasias Uterinas/genética , Neoplasias Uterinas/metabolismo
19.
Hum Genet ; 136(11-12): 1497-1498, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28975356

RESUMEN

The article "A multi-stage genome-wide association study of uterine fibroids in African Americans", written by Jacklyn N. Hellwege, was originally published Online First without open access. After publication in volume 136, issue 10, page 1363-1373 the author decided to opt for Open Choice and to make the article an open access publication. Therefore, the copyright of the article has been changed to

20.
Genet Epidemiol ; 38(5): 402-15, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24799323

RESUMEN

High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic, or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome, and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. Using clinical statin response, we show improved prediction over existing methods. We provide an R package to implement OmicKriging (http://www.scandb.org/newinterface/tools/OmicKriging.html).


Asunto(s)
Biología Computacional/métodos , Predisposición Genética a la Enfermedad/genética , Herencia Multifactorial/genética , Teorema de Bayes , Estudios de Casos y Controles , Procesos de Crecimiento Celular/genética , LDL-Colesterol/sangre , Humanos , MicroARNs/genética , Modelos Genéticos , Fenotipo , ARN Mensajero/genética , Simvastatina/farmacología , Programas Informáticos , Biología de Sistemas/métodos , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda