Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Genome Biol ; 25(1): 28, 2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38254214

RESUMO

Genetic regulation of gene expression is a complex process, with genetic effects known to vary across cellular contexts such as cell types and environmental conditions. We developed SURGE, a method for unsupervised discovery of context-specific expression quantitative trait loci (eQTLs) from single-cell transcriptomic data. This allows discovery of the contexts or cell types modulating genetic regulation without prior knowledge. Applied to peripheral blood single-cell eQTL data, SURGE contexts capture continuous representations of distinct cell types and groupings of biologically related cell types. We demonstrate the disease-relevance of SURGE context-specific eQTLs using colocalization analysis and stratified LD-score regression.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Locos de Características Quantitativas , Transcriptoma , Análise de Sequência de RNA
2.
medRxiv ; 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38106023

RESUMO

The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.

3.
medRxiv ; 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37961337

RESUMO

Heritable diseases often manifest in a highly tissue-specific manner, with different disease loci mediated by genes in distinct tissues or cell types. We propose Tissue-Gene Fine-Mapping (TGFM), a fine-mapping method that infers the posterior probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing GWAS summary statistics (and in-sample LD) and leveraging eQTL data from diverse tissues to build cis-predicted expression models; TGFM also assigns PIPs to causal variants that are not mediated by gene expression in assayed genes and tissues. TGFM accounts for both co-regulation across genes and tissues and LD between SNPs (generalizing existing fine-mapping methods), and incorporates genome-wide estimates of each tissue's contribution to disease as tissue-level priors. TGFM was well-calibrated and moderately well-powered in simulations; unlike previous methods, TGFM was able to attain correct calibration by modeling uncertainty in cis-predicted expression models. We applied TGFM to 45 UK Biobank diseases/traits (average N=316K) using eQTL data from 38 GTEx tissues. TGFM identified an average of 147 PIP > 0.5 causal genetic elements per disease/trait, of which 11% were gene-tissue pairs. Implicated gene-tissue pairs were concentrated in known disease-critical tissues, and causal genes were strongly enriched in disease-relevant gene sets. Causal gene-tissue pairs identified by TGFM recapitulated known biology (e.g., TPO-thyroid for Hypothyroidism), but also included biologically plausible novel findings (e.g., SLC20A2-artery aorta for Diastolic blood pressure). Further application of TGFM to single-cell eQTL data from 9 cell types in peripheral blood mononuclear cells (PBMC), analyzed jointly with GTEx tissues, identified 30 additional causal gene-PBMC cell type pairs at PIP > 0.5-primarily for autoimmune disease and blood cell traits, including the well-established role of CTLA4 in CD8+ T cells for All autoimmune disease. In conclusion, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci.

4.
Cell Genom ; 3(10): 100401, 2023 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37868038

RESUMO

Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis, which included several hundred individuals, with whole-genome sequencing, transcriptomes, methylomes, and proteomes collected across two time points, 10 years apart. We evaluated each multi-omics phenotype's ability to separately and jointly inform functional rare variation. By combining expression and protein data, we observed rare stop variants 62 times and rare frameshift variants 216 times as frequently as controls, compared to 13-27 times as frequently for expression or protein effects alone. We extended a Bayesian hierarchical model, "Watershed," to prioritize specific rare variants underlying multi-omics signals across the regulatory cascade. With this approach, we identified rare variants that exhibited large effect sizes on multiple complex traits including height, schizophrenia, and Alzheimer's disease.

5.
Nat Commun ; 14(1): 6317, 2023 10 09.
Artigo em Inglês | MEDLINE | ID: mdl-37813843

RESUMO

Differential allele-specific expression (ASE) is a powerful tool to study context-specific cis-regulation of gene expression. Such effects can reflect the interaction between genetic or epigenetic factors and a measured context or condition. Single-cell RNA sequencing (scRNA-seq) allows the measurement of ASE at individual-cell resolution, but there is a lack of statistical methods to analyze such data. We present Differential Allelic Expression using Single-Cell data (DAESC), a powerful method for differential ASE analysis using scRNA-seq from multiple individuals, with statistical behavior confirmed through simulation. DAESC accounts for non-independence between cells from the same individual and incorporates implicit haplotype phasing. Application to data from 105 induced pluripotent stem cell (iPSC) lines identifies 657 genes dynamically regulated during endoderm differentiation, with enrichment for changes in chromatin state. Application to a type-2 diabetes dataset identifies several differentially regulated genes between patients and controls in pancreatic endocrine cells. DAESC is a powerful method for single-cell ASE analysis and can uncover novel insights on gene regulation.


Assuntos
Diabetes Mellitus Tipo 2 , Regulação da Expressão Gênica , Humanos , Alelos , Diferenciação Celular/genética , Simulação por Computador , Diabetes Mellitus Tipo 2/metabolismo , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos
6.
Res Sq ; 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38168385

RESUMO

The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.

7.
Elife ; 112022 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-35142607

RESUMO

Practically all studies of gene expression in humans to date have been performed in a relatively small number of adult tissues. Gene regulation is highly dynamic and context-dependent. In order to better understand the connection between gene regulation and complex phenotypes, including disease, we need to be able to study gene expression in more cell types, tissues, and states that are relevant to human phenotypes. In particular, we need to characterize gene expression in early development cell types, as mutations that affect developmental processes may be of particular relevance to complex traits. To address this challenge, we propose to use embryoid bodies (EBs), which are organoids that contain a multitude of cell types in dynamic states. EBs provide a system in which one can study dynamic regulatory processes at an unprecedentedly high resolution. To explore the utility of EBs, we systematically explored cellular and gene expression heterogeneity in EBs from multiple individuals. We characterized the various cell types that arise from EBs, the extent to which they recapitulate gene expression in vivo, and the relative contribution of technical and biological factors to variability in gene expression, cell composition, and differentiation efficiency. Our results highlight the utility of EBs as a new model system for mapping dynamic inter-individual regulatory differences in a large variety of cell types.


One major goal of human genetics is to understand how changes in the way genes are regulated affect human traits, including disease susceptibility. To date, most studies of gene regulation have been performed in adult tissues, such as liver or kidney tissue, that were collected at a single time point. Yet, gene regulation is highly dynamic and context-dependent, meaning that it is important to gather data from a greater variety of cell types at different stages of their development. Additionally, observing which genes switch on and off in response to external treatments can shed light on how genetic variation can drive errors in gene regulation and cause diseases. Stem cells can produce more cells like themselves or differentiate ­ acquire the characteristics ­ of many cell types. These cells have been used in the laboratory to research gene regulation. Unfortunately, these studies often fail to capture the complex spatial and temporal dynamics of stem cell differentiation; in particular, these studies are unable to observe gene regulation in the transient cell types that appear early in embryonic development. To overcome these limitations, scientists developed systems such as embryoid bodies: three-dimensional aggregates of stem cells that, when grown under certain conditions, spontaneously develop into a variety of cell types. Rhodes, Barr et al. wanted to assess the utility of embryoid bodies as a model to study how genes are dynamically regulated in different cell types, by different individuals who have distinct genetic makeups. To do this, they grew embryoid bodies made from human stem cells from different individuals to examine which genes switched on and off as the stem cells that formed the embryoid bodies differentiated into different types of cells. The results showed that it was possible to grow embryoid bodies derived from genetically distinct individuals that consistently produce diverse cell types, similar to those found during human fetal development. Rhodes, Barr et al.'s findings suggest that embryoid bodies are a useful model to study gene regulation across individuals with different genetic backgrounds. This could accelerate research into how genetics are associated with disease by capturing gene regulatory dynamics at an unprecedentedly high spatial and temporal resolution. Additionally, embryoid bodies could be used to explore how exposure to different environmental factors during early development affect disease-related outcomes in adulthood in different individuals.


Assuntos
Diferenciação Celular/genética , Corpos Embrioides/citologia , Regulação da Expressão Gênica , Linhagem Celular , Corpos Embrioides/metabolismo , Feminino , Genoma Humano , Humanos , Células-Tronco Pluripotentes Induzidas , Masculino , Análise de Sequência de RNA
8.
PLoS Genet ; 18(1): e1009666, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35061661

RESUMO

Dynamic and temporally specific gene regulatory changes may underlie unexplained genetic associations with complex disease. During a dynamic process such as cellular differentiation, the overall cell type composition of a tissue (or an in vitro culture) and the gene regulatory profile of each cell can both experience significant changes over time. To identify these dynamic effects in high resolution, we collected single-cell RNA-sequencing data over a differentiation time course from induced pluripotent stem cells to cardiomyocytes, sampled at 7 unique time points in 19 human cell lines. We employed a flexible approach to map dynamic eQTLs whose effects vary significantly over the course of bifurcating differentiation trajectories, including many whose effects are specific to one of these two lineages. Our study design allowed us to distinguish true dynamic eQTLs affecting a specific cell lineage from expression changes driven by potentially non-genetic differences between cell lines such as cell composition. Additionally, we used the cell type profiles learned from single-cell data to deconvolve and re-analyze data from matched bulk RNA-seq samples. Using this approach, we were able to identify a large number of novel dynamic eQTLs in single cell data while also attributing dynamic effects in bulk to a particular lineage. Overall, we found that using single cell data to uncover dynamic eQTLs can provide new insight into the gene regulatory changes that occur among heterogeneous cell types during cardiomyocyte differentiation.


Assuntos
Perfilação da Expressão Gênica/métodos , Células-Tronco Pluripotentes Induzidas/citologia , Miócitos Cardíacos/citologia , Análise de Célula Única/métodos , Técnicas de Cultura de Células , Diferenciação Celular , Linhagem Celular , Linhagem da Célula , Regulação da Expressão Gênica , Humanos , Células-Tronco Pluripotentes Induzidas/química , Miócitos Cardíacos/química , RNA-Seq
9.
Am J Hum Genet ; 109(2): 223-239, 2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35085493

RESUMO

Uncovering the functional impact of genetic variation on gene expression is important in understanding tissue biology and the pathogenesis of complex traits. Despite large efforts to map expression quantitative trait loci (eQTLs) across many human tissues, our ability to translate those findings to understanding human disease has been incomplete, and the majority of disease loci are not explained by association with expression of a target gene. Cell-type specificity and the presence of multiple independent causal variants for many eQTLs are potential confounders contributing to the apparent discrepancy with disease loci. In this study, we investigate the tissue specificity of genetic effects on gene expression and the overlap with disease loci while considering the presence of multiple causal variants within and across tissues. We find evidence of pervasive tissue specificity of eQTLs, often masked by linkage disequilibrium that misleads traditional meta-analytic approaches. We propose CAFEH (colocalization and fine-mapping in the presence of allelic heterogeneity), a Bayesian method that integrates genetic association data across multiple traits, incorporating linkage disequilibrium to identify causal variants. CAFEH outperforms previous approaches in colocalization and fine-mapping. Using CAFEH, we show that genes with highly tissue-specific genetic effects are under greater selection, enriched in differentiation and developmental processes, and more likely to be involved in human disease. Last, we demonstrate that CAFEH can efficiently leverage the widespread allelic heterogeneity in genetic regulation of gene expression to prioritize the target tissue in genome-wide association complex trait loci, thereby improving our ability to interpret complex trait genetics.


Assuntos
Alelos , Regulação da Expressão Gênica , Heterogeneidade Genética , Genoma Humano , Herança Multifatorial , Tecido Adiposo/metabolismo , Teorema de Bayes , Mapeamento Cromossômico , Fibroblastos/metabolismo , Estudo de Associação Genômica Ampla , Ventrículos do Coração/metabolismo , Humanos , Desequilíbrio de Ligação , Especificidade de Órgãos , Locos de Características Quantitativas , Glândula Tireoide/metabolismo
10.
Cell ; 184(10): 2633-2648.e19, 2021 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-33864768

RESUMO

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.


Assuntos
Doença/genética , Herança Multifatorial/genética , População/genética , RNA Longo não Codificante/genética , Transcriptoma , Doença da Artéria Coronariana/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Perfilação da Expressão Gênica , Variação Genética , Humanos , Doenças Inflamatórias Intestinais/genética , Especificidade de Órgãos/genética , Locos de Características Quantitativas
11.
Science ; 369(6509)2020 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-32913073

RESUMO

Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.


Assuntos
Variação Genética , Genoma Humano , Herança Multifatorial , Transcriptoma , Humanos , Especificidade de Órgãos
12.
Nat Med ; 25(6): 911-919, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31160820

RESUMO

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.


Assuntos
Doenças Raras/genética , Ceramidase Ácida/genética , Estudos de Casos e Controles , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Variação Genética , Humanos , Masculino , Modelos Genéticos , Mutação , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/genética , Canais de Potássio/genética , RNA/sangue , RNA/genética , Splicing de RNA/genética , Doenças Raras/sangue , Análise de Sequência de RNA , Sequenciamento do Exoma
13.
Nature ; 550(7675): 239-243, 2017 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-29022581

RESUMO

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.


Assuntos
Perfilação da Expressão Gênica , Variação Genética/genética , Especificidade de Órgãos/genética , Teorema de Bayes , Feminino , Genoma Humano/genética , Genômica , Genótipo , Humanos , Masculino , Modelos Genéticos , Análise de Sequência de RNA
14.
Nat Genet ; 47(8): 955-61, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26075791

RESUMO

Most variants implicated in common human disease by genome-wide association studies (GWAS) lie in noncoding sequence intervals. Despite the suggestion that regulatory element disruption represents a common theme, identifying causal risk variants within implicated genomic regions remains a major challenge. Here we present a new sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) that encodes cell type-specific regulatory sequence vocabularies. The induced change in the gkm-SVM score, deltaSVM, quantifies the effect of variants. We show that deltaSVM accurately predicts the impact of SNPs on DNase I sensitivity in their native genomic contexts and accurately predicts the results of dense mutagenesis of several enhancers in reporter assays. Previously validated GWAS SNPs yield large deltaSVM scores, and we predict new risk-conferring SNPs for several autoimmune diseases. Thus, deltaSVM provides a powerful computational approach to systematically identify functional regulatory variants.


Assuntos
Biologia Computacional/métodos , Predisposição Genética para Doença/genética , Polimorfismo de Nucleotídeo Único , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Doenças Autoimunes/genética , Sequência de Bases , Linhagem Celular Tumoral , Desoxirribonuclease I/metabolismo , Elementos Facilitadores Genéticos/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Células Hep G2 , Humanos , Camundongos , Mutação , Locos de Características Quantitativas/genética , Reprodutibilidade dos Testes , Fatores de Risco , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...