RESUMEN
Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.
RESUMEN
CROP-Seq combines gene silencing using CRISPR interference with single-cell RNA sequencing. Here, we applied CROP-Seq to study adipogenesis and adipocyte biology. Human preadipocyte SGBS cell line expressing KRAB-dCas9 was transduced with a sgRNA library. Following selection, individual cells were captured using microfluidics at different timepoints during adipogenesis. Bioinformatic analysis of transcriptomic data was used to determine the knockdown effects, the dysregulated pathways, and to predict cellular phenotypes. Single-cell transcriptomes recapitulated adipogenesis states. For all targets, over 400 differentially expressed genes were identified at least at one timepoint. As a validation of our approach, the knockdown of PPARG and CEBPB (which encode key proadipogenic transcription factors) resulted in the inhibition of adipogenesis. Gene set enrichment analysis generated hypotheses regarding the molecular function of novel genes. MAFF knockdown led to downregulation of transcriptional response to proinflammatory cytokine TNF-α in preadipocytes and to decreased CXCL-16 and IL-6 secretion. TIPARP knockdown resulted in increased expression of adipogenesis markers. In summary, this powerful, hypothesis-free tool can identify novel regulators of adipogenesis, preadipocyte, and adipocyte function associated with metabolic disease.NEW & NOTEWORTHY Genomics efforts led to the identification of many genomic loci that are associated with metabolic traits, many of which are tied to adipose tissue function. However, determination of the causal genes, and their mechanism of action in metabolism, is a time-consuming process. Here, we use an approach to determine the transcriptional outcome of candidate gene knockdown for multiple genes at the same time in a human cell model of adipogenesis.
Asunto(s)
Enfermedades Metabólicas , ARN Guía de Sistemas CRISPR-Cas , Humanos , Adipogénesis/genética , Adipocitos/metabolismo , Línea Celular , Enfermedades Metabólicas/metabolismo , Diferenciación Celular/genéticaRESUMEN
A major challenge in human genetics is to identify the molecular mechanisms of trait-associated and disease-associated variants. To achieve this, quantitative trait locus (QTL) mapping of genetic variants with intermediate molecular phenotypes such as gene expression and splicing have been widely adopted1,2. However, despite successes, the molecular basis for a considerable fraction of trait-associated and disease-associated variants remains unclear3,4. Here we show that ADAR-mediated adenosine-to-inosine RNA editing, a post-transcriptional event vital for suppressing cellular double-stranded RNA (dsRNA)-mediated innate immune interferon responses5-11, is an important potential mechanism underlying genetic variants associated with common inflammatory diseases. We identified and characterized 30,319 cis-RNA editing QTLs (edQTLs) across 49 human tissues. These edQTLs were significantly enriched in genome-wide association study signals for autoimmune and immune-mediated diseases. Colocalization analysis of edQTLs with disease risk loci further pinpointed key, putatively immunogenic dsRNAs formed by expected inverted repeat Alu elements as well as unexpected, highly over-represented cis-natural antisense transcripts. Furthermore, inflammatory disease risk variants, in aggregate, were associated with reduced editing of nearby dsRNAs and induced interferon responses in inflammatory diseases. This unique directional effect agrees with the established mechanism that lack of RNA editing by ADAR1 leads to the specific activation of the dsRNA sensor MDA5 and subsequent interferon responses and inflammation7-9. Our findings implicate cellular dsRNA editing and sensing as a previously underappreciated mechanism of common inflammatory diseases.
Asunto(s)
Adenosina Desaminasa , Predisposición Genética a la Enfermedad , Enfermedades del Sistema Inmune , Inflamación , Edición de ARN , ARN Bicatenario , Adenosina/metabolismo , Adenosina Desaminasa/genética , Adenosina Desaminasa/metabolismo , Elementos Alu/genética , Enfermedades Autoinmunes/genética , Enfermedades Autoinmunes/inmunología , Enfermedades Autoinmunes/patología , Estudio de Asociación del Genoma Completo , Humanos , Enfermedades del Sistema Inmune/genética , Enfermedades del Sistema Inmune/inmunología , Enfermedades del Sistema Inmune/patología , Inmunidad Innata , Inflamación/genética , Inflamación/inmunología , Inflamación/patología , Inosina/metabolismo , Helicasa Inducida por Interferón IFIH1/metabolismo , Interferones/genética , Interferones/inmunología , Sitios de Carácter Cuantitativo/genética , Edición de ARN/genética , ARN Bicatenario/genética , Proteínas de Unión al ARN/metabolismoRESUMEN
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
Asunto(s)
Herencia Multifactorial , Obesidad , Índice de Masa Corporal , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Herencia Multifactorial/genética , Obesidad/genética , Fenotipo , Factores de RiesgoRESUMEN
Associations between genetic variation and traits are often in noncoding regions with strong linkage disequilibrium (LD), where a single causal variant is assumed to underlie the association. We applied a massively parallel reporter assay (MPRA) to functionally evaluate genetic variants in high, local LD for independent cis-expression quantitative trait loci (eQTL). We found that 17.7% of eQTLs exhibit more than one major allelic effect in tight LD. The detected regulatory variants were highly and specifically enriched for activating chromatin structures and allelic transcription factor binding. Integration of MPRA profiles with eQTL/complex trait colocalizations across 114 human traits and diseases identified causal variant sets demonstrating how genetic association signals can manifest through multiple, tightly linked causal variants.
Asunto(s)
Variación Genética , Desequilibrio de Ligamiento , Herencia Multifactorial , Sitios de Carácter Cuantitativo , Alelos , Asma/genética , Cromatina/metabolismo , Predisposición Genética a la Enfermedad , Genoma Humano , Estudio de Asociación del Genoma Completo , Haplotipos , Código de Histonas , Humanos , Enfermedades Inflamatorias del Intestino/genética , Esclerosis Múltiple/genética , Fenotipo , Recuento de Plaquetas , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Regiones no TraducidasRESUMEN
BACKGROUND: Identification of causal genes for polygenic human diseases has been extremely challenging, and our understanding of how physiological and pharmacological stimuli modulate genetic risk at disease-associated loci is limited. Specifically, insulin resistance (IR), a common feature of cardiometabolic disease, including type 2 diabetes, obesity, and dyslipidemia, lacks well-powered genome-wide association studies (GWAS), and therefore, few associated loci and causal genes have been identified. METHODS: Here, we perform and integrate linkage disequilibrium (LD)-adjusted colocalization analyses across nine cardiometabolic traits (fasting insulin, fasting glucose, insulin sensitivity, insulin sensitivity index, type 2 diabetes, triglycerides, high-density lipoprotein, body mass index, and waist-hip ratio) combined with expression and splicing quantitative trait loci (eQTLs and sQTLs) from five metabolically relevant human tissues (subcutaneous and visceral adipose, skeletal muscle, liver, and pancreas). To elucidate the upstream regulators and functional mechanisms for these genes, we integrate their transcriptional responses to 21 relevant physiological and pharmacological perturbations in human adipocytes, hepatocytes, and skeletal muscle cells and map their protein-protein interactions. RESULTS: We identify 470 colocalized loci and prioritize 207 loci with a single colocalized gene. Patterns of shared colocalizations across traits and tissues highlight different potential roles for colocalized genes in cardiometabolic disease and distinguish several genes involved in pancreatic ß-cell function from others with a more direct role in skeletal muscle, liver, and adipose tissues. At the loci with a single colocalized gene, 42 of these genes were regulated by insulin and 35 by glucose in perturbation experiments, including 17 regulated by both. Other metabolic perturbations regulated the expression of 30 more genes not regulated by glucose or insulin, pointing to other potential upstream regulators of candidate causal genes. CONCLUSIONS: Our use of transcriptional responses under metabolic perturbations to contextualize genetic associations from our custom colocalization approach provides a list of likely causal genes and their upstream regulators in the context of IR-associated cardiometabolic risk.
Asunto(s)
Enfermedades Cardiovasculares , Diabetes Mellitus Tipo 2 , Resistencia a la Insulina , Enfermedades Cardiovasculares/genética , Diabetes Mellitus Tipo 2/genética , Estudio de Asociación del Genoma Completo , Humanos , Resistencia a la Insulina/genética , Sitios de Carácter CuantitativoRESUMEN
Complex traits and diseases can be influenced by both genetics and environment. However, given the large number of environmental stimuli and power challenges for gene-by-environment testing, it remains a critical challenge to identify and prioritize specific disease-relevant environmental exposures. We propose a framework for leveraging signals from transcriptional responses to environmental perturbations to identify disease-relevant perturbations that can modulate genetic risk for complex traits and inform the functions of genetic variants associated with complex traits. We perturbed human skeletal-muscle-, fat-, and liver-relevant cell lines with 21 perturbations affecting insulin resistance, glucose homeostasis, and metabolic regulation in humans and identified thousands of environmentally responsive genes. By combining these data with GWASs from 31 distinct polygenic traits, we show that the heritability of multiple traits is enriched in regions surrounding genes responsive to specific perturbations and, further, that environmentally responsive genes are enriched for associations with specific diseases and phenotypes from the GWAS Catalog. Overall, we demonstrate the advantages of large-scale characterization of transcriptional changes in diversely stimulated and pathologically relevant cells to identify disease-relevant perturbations.
Asunto(s)
Interacción Gen-Ambiente , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Enfermedades Autoinmunes/etiología , Enfermedades Autoinmunes/patología , Humanos , Trastornos Mentales/etiología , Trastornos Mentales/patología , Enfermedades Metabólicas/etiología , Enfermedades Metabólicas/patología , FenotipoRESUMEN
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
Asunto(s)
Enfermedad/genética , Herencia Multifactorial/genética , Población/genética , ARN Largo no Codificante/genética , Transcriptoma , Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Perfilación de la Expresión Génica , Variación Genética , Humanos , Enfermedades Inflamatorias del Intestino/genética , Especificidad de Órganos/genética , Sitios de Carácter CuantitativoRESUMEN
Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.
Asunto(s)
Variación Genética , Células Madre Pluripotentes Inducidas/fisiología , Sitios de Carácter Cuantitativo , Síndrome de Bardet-Biedl/genética , Canales de Calcio/genética , Línea Celular , Ataxia Cerebelosa/genética , Metilación de ADN , Expresión Génica , Humanos , Células Madre Pluripotentes Inducidas/citología , Polimorfismo de Nucleótido Simple , Proteínas/genética , Enfermedades Raras/genética , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia de ARN , Secuenciación Completa del GenomaRESUMEN
Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer's and Parkinson's diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson's disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.
Asunto(s)
Enfermedad de Alzheimer/genética , Encéfalo/anatomía & histología , Neuronas/fisiología , Enfermedad de Parkinson/genética , Adulto , Atlas como Asunto , Variación Biológica Poblacional , Ensamble y Desensamble de Cromatina , Estudios de Cohortes , Elementos de Facilitación Genéticos , Epigenómica , Heterogeneidad Genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Haplotipos , Humanos , Aprendizaje Automático , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas , Proteínas tau/genéticaRESUMEN
Genetic variation in the FAM13A (Family with Sequence Similarity 13 Member A) locus has been associated with several glycemic and metabolic traits in genome-wide association studies (GWAS). Here, we demonstrate that in humans, FAM13A alleles are associated with increased FAM13A expression in subcutaneous adipose tissue (SAT) and an insulin resistance-related phenotype (e.g. higher waist-to-hip ratio and fasting insulin levels, but lower body fat). In human adipocyte models, knockdown of FAM13A in preadipocytes accelerates adipocyte differentiation. In mice, Fam13a knockout (KO) have a lower visceral to subcutaneous fat (VAT/SAT) ratio after high-fat diet challenge, in comparison to their wild-type counterparts. Subcutaneous adipocytes in KO mice show a size distribution shift toward an increased number of smaller adipocytes, along with an improved adipogenic potential. Our results indicate that GWAS-associated variants within the FAM13A locus alter adipose FAM13A expression, which in turn, regulates adipocyte differentiation and contribute to changes in body fat distribution.
Asunto(s)
Adipocitos/metabolismo , Distribución de la Grasa Corporal , Proteínas Activadoras de GTPasa/genética , Adipogénesis/genética , Animales , Diferenciación Celular/genética , Proteínas Activadoras de GTPasa/metabolismo , Técnicas de Silenciamiento del Gen , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Células HEK293 , Humanos , Resistencia a la Insulina/genética , Grasa Intraabdominal/metabolismo , Masculino , Metabolómica , Ratones Endogámicos C57BL , Ratones Noqueados , Fenotipo , Polimorfismo de Nucleótido Simple/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Grasa Subcutánea/metabolismoRESUMEN
BACKGROUND: Molecular and cellular changes are intrinsic to aging and age-related diseases. Prior cross-sectional studies have investigated the combined effects of age and genetics on gene expression and alternative splicing; however, there has been no long-term, longitudinal characterization of these molecular changes, especially in older age. RESULTS: We perform RNA sequencing in whole blood from the same individuals at ages 70 and 80 to quantify how gene expression, alternative splicing, and their genetic regulation are altered during this 10-year period of advanced aging at a population and individual level. We observe that individuals are more similar to their own expression profiles later in life than profiles of other individuals their own age. We identify 1291 and 294 genes differentially expressed and alternatively spliced with age, as well as 529 genes with outlying individual trajectories. Further, we observe a strong correlation of genetic effects on expression and splicing between the two ages, with a small subset of tested genes showing a reduction in genetic associations with expression and splicing in older age. CONCLUSIONS: These findings demonstrate that, although the transcriptome and its genetic regulation is mostly stable late in life, a small subset of genes is dynamic and is characterized by a reduction in genetic regulation, most likely due to increasing environmental variance with age.
Asunto(s)
Envejecimiento/genética , Empalme Alternativo , Regulación de la Expresión Génica , Anciano , Anciano de 80 o más Años , Envejecimiento/metabolismo , Femenino , Humanos , MasculinoRESUMEN
Heart failure is a leading cause of mortality, yet our understanding of the genetic interactions underlying this disease remains incomplete. Here, we harvest 1352 healthy and failing human hearts directly from transplant center operating rooms, and obtain genome-wide genotyping and gene expression measurements for a subset of 313. We build failing and non-failing cardiac regulatory gene networks, revealing important regulators and cardiac expression quantitative trait loci (eQTLs). PPP1R3A emerges as a regulator whose network connectivity changes significantly between health and disease. RNA sequencing after PPP1R3A knockdown validates network-based predictions, and highlights metabolic pathway regulation associated with increased cardiomyocyte size and perturbed respiratory metabolism. Mice lacking PPP1R3A are protected against pressure-overload heart failure. We present a global gene interaction map of the human heart failure transition, identify previously unreported cardiac eQTLs, and demonstrate the discovery potential of disease-specific networks through the description of PPP1R3A as a central regulator in heart failure.
Asunto(s)
Redes Reguladoras de Genes/genética , Insuficiencia Cardíaca/genética , Miocitos Cardíacos/patología , Fosfoproteínas Fosfatasas/metabolismo , Animales , Bencenoacetamidas , Células Cultivadas , Conjuntos de Datos como Asunto , Modelos Animales de Enfermedad , Femenino , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Técnicas de Silenciamiento del Gen , Estudio de Asociación del Genoma Completo , Insuficiencia Cardíaca/etiología , Insuficiencia Cardíaca/metabolismo , Insuficiencia Cardíaca/patología , Humanos , Masculino , Redes y Vías Metabólicas/genética , Ratones , Ratones Noqueados , Persona de Mediana Edad , Fosfoproteínas Fosfatasas/genética , Cultivo Primario de Células , Piridinas , Sitios de Carácter Cuantitativo/genética , Ratas , Ratas Sprague-Dawley , Análisis de Secuencia de ARN/métodosRESUMEN
The retinal pigment epithelium (RPE) serves vital roles in ocular development and retinal homeostasis but has limited representation in large-scale functional genomics datasets. Understanding how common human genetic variants affect RPE gene expression could elucidate the sources of phenotypic variability in selected monogenic ocular diseases and pinpoint causal genes at genome-wide association study (GWAS) loci. We interrogated the genetics of gene expression of cultured human fetal RPE (fRPE) cells under two metabolic conditions and discovered hundreds of shared or condition-specific expression or splice quantitative trait loci (e/sQTLs). Co-localizations of fRPE e/sQTLs with age-related macular degeneration (AMD) and myopia GWAS data suggest new candidate genes, and mechanisms by which a common RDH5 allele contributes to both increased AMD risk and decreased myopia risk. Our study highlights the unique transcriptomic characteristics of fRPE and provides a resource to connect e/sQTLs in a critical ocular cell type to monogenic and complex eye disorders.
Asunto(s)
Epitelio Pigmentado de la Retina/metabolismo , Oxidorreductasas de Alcohol/genética , Células Cultivadas , Mapeo Cromosómico , Metabolismo Energético , Feto/citología , Feto/metabolismo , Expresión Génica , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Degeneración Macular/genética , Miopía/genética , Degradación de ARNm Mediada por Codón sin Sentido , Sitios de Carácter Cuantitativo , Epitelio Pigmentado de la Retina/citología , Epitelio Pigmentado de la Retina/embriología , Factores de Riesgo , TranscriptomaRESUMEN
Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.