RESUMEN
Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
Asunto(s)
Estatura , Mapeo Cromosómico , Polimorfismo de Nucleótido Simple , Humanos , Estatura/genética , Frecuencia de los Genes/genética , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Haplotipos/genética , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Europa (Continente)/etnología , Tamaño de la Muestra , FenotipoRESUMEN
Genetic studies have identified ≥240 loci associated with the risk of type 2 diabetes (T2D), yet most of these loci lie in non-coding regions, masking the underlying molecular mechanisms. Recent studies investigating mRNA expression in human pancreatic islets have yielded important insights into the molecular drivers of normal islet function and T2D pathophysiology. However, similar studies investigating microRNA (miRNA) expression remain limited. Here, we present data from 63 individuals, the largest sequencing-based analysis of miRNA expression in human islets to date. We characterized the genetic regulation of miRNA expression by decomposing the expression of highly heritable miRNAs into cis- and trans-acting genetic components and mapping cis-acting loci associated with miRNA expression [miRNA-expression quantitative trait loci (eQTLs)]. We found i) 84 heritable miRNAs, primarily regulated by trans-acting genetic effects, and ii) 5 miRNA-eQTLs. We also used several different strategies to identify T2D-associated miRNAs. First, we colocalized miRNA-eQTLs with genetic loci associated with T2D and multiple glycemic traits, identifying one miRNA, miR-1908, that shares genetic signals for blood glucose and glycated hemoglobin (HbA1c). Next, we intersected miRNA seed regions and predicted target sites with credible set SNPs associated with T2D and glycemic traits and found 32 miRNAs that may have altered binding and function due to disrupted seed regions. Finally, we performed differential expression analysis and identified 14 miRNAs associated with T2D status-including miR-187-3p, miR-21-5p, miR-668, and miR-199b-5p-and 4 miRNAs associated with a polygenic score for HbA1c levels-miR-216a, miR-25, miR-30a-3p, and miR-30a-5p.
Asunto(s)
Diabetes Mellitus Tipo 2 , Islotes Pancreáticos , MicroARNs , Humanos , MicroARNs/metabolismo , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Hemoglobina Glucada , Islotes Pancreáticos/metabolismo , Sitios de Carácter Cuantitativo/genéticaRESUMEN
Genetic association studies have identified hundreds of independent signals associated with type 2 diabetes (T2D) and related traits. Despite these successes, the identification of specific causal variants underlying a genetic association signal remains challenging. In this study, we describe a deep learning (DL) method to analyze the impact of sequence variants on enhancers. Focusing on pancreatic islets, a T2D relevant tissue, we show that our model learns islet-specific transcription factor (TF) regulatory patterns and can be used to prioritize candidate causal variants. At 101 genetic signals associated with T2D and related glycemic traits where multiple variants occur in linkage disequilibrium, our method nominates a single causal variant for each association signal, including three variants previously shown to alter reporter activity in islet-relevant cell types. For another signal associated with blood glucose levels, we biochemically test all candidate causal variants from statistical fine-mapping using a pancreatic islet beta cell line and show biochemical evidence of allelic effects on TF binding for the model-prioritized variant. To aid in future research, we publicly distribute our model and islet enhancer perturbation scores across ~67 million genetic variants. We anticipate that DL methods like the one presented in this study will enhance the prioritization of candidate causal variants for functional studies.
Asunto(s)
Aprendizaje Profundo , Diabetes Mellitus Tipo 2 , Elementos de Facilitación Genéticos , Islotes Pancreáticos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patología , Islotes Pancreáticos/metabolismo , Islotes Pancreáticos/patología , Variación Genética , Humanos , Simulación por ComputadorRESUMEN
Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.
Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Bilirrubina , Carnitina , Glicerofosfolípidos , Humanos , Masculino , Metabolómica , Sitios de Carácter Cuantitativo/genética , Miembro 5 de la Familia 22 de Transportadores de Solutos/genética , Transcriptoma/genéticaRESUMEN
Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency < 1%) predicted damaging coding variation by using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels; some of these genes have not been previously associated with lipid levels when using rare coding variation from population-based samples. We prioritize 32 genes in array-based genome-wide association study (GWAS) loci based on aggregations of rare coding variants; three (EVI5, SH2B3, and PLIN1) had no prior association of rare coding variants with lipid levels. Most of our associated genes showed evidence of association among multiple ancestries. Finally, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes and for genes closest to GWAS index single-nucleotide polymorphisms (SNPs). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.
Asunto(s)
Exoma , Variación Genética , Estudio de Asociación del Genoma Completo , Lípidos/sangre , Sistemas de Lectura Abierta , Alelos , Glucemia/genética , Estudios de Casos y Controles , Biología Computacional/métodos , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Predisposición Genética a la Enfermedad , Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Humanos , Metabolismo de los Lípidos/genética , Hígado/metabolismo , Hígado/patología , Anotación de Secuencia Molecular , Herencia Multifactorial , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
AIMS/HYPOTHESIS: Disruption of pancreatic islet function and glucose homeostasis can lead to the development of sustained hyperglycaemia, beta cell glucotoxicity and subsequently type 2 diabetes. In this study, we explored the effects of in vitro hyperglycaemic conditions on human pancreatic islet gene expression across 24 h in six pancreatic cell types: alpha; beta; gamma; delta; ductal; and acinar. We hypothesised that genes associated with hyperglycaemic conditions may be relevant to the onset and progression of diabetes. METHODS: We exposed human pancreatic islets from two donors to low (2.8 mmol/l) and high (15.0 mmol/l) glucose concentrations over 24 h in vitro. To assess the transcriptome, we performed single-cell RNA-seq (scRNA-seq) at seven time points. We modelled time as both a discrete and continuous variable to determine momentary and longitudinal changes in transcription associated with islet time in culture or glucose exposure. Additionally, we integrated genomic features and genetic summary statistics to nominate candidate effector genes. For three of these genes, we functionally characterised the effect on insulin production and secretion using CRISPR interference to knock down gene expression in EndoC-ßH1 cells, followed by a glucose-stimulated insulin secretion assay. RESULTS: In the discrete time models, we identified 1344 genes associated with time and 668 genes associated with glucose exposure across all cell types and time points. In the continuous time models, we identified 1311 genes associated with time, 345 genes associated with glucose exposure and 418 genes associated with interaction effects between time and glucose across all cell types. By integrating these expression profiles with summary statistics from genetic association studies, we identified 2449 candidate effector genes for type 2 diabetes, HbA1c, random blood glucose and fasting blood glucose. Of these candidate effector genes, we showed that three (ERO1B, HNRNPA2B1 and RHOBTB3) exhibited an effect on glucose-stimulated insulin production and secretion in EndoC-ßH1 cells. CONCLUSIONS/INTERPRETATION: The findings of our study provide an in-depth characterisation of the 24 h transcriptomic response of human pancreatic islets to glucose exposure at a single-cell resolution. By integrating differentially expressed genes with genetic signals for type 2 diabetes and glucose-related traits, we provide insights into the molecular mechanisms underlying glucose homeostasis. Finally, we provide functional evidence to support the role of three candidate effector genes in insulin secretion and production. DATA AVAILABILITY: The scRNA-seq data from the 24 h glucose exposure experiment performed in this study are available in the database of Genotypes and Phenotypes (dbGap; https://www.ncbi.nlm.nih.gov/gap/ ) with accession no. phs001188.v3.p1. Study metadata and summary statistics for the differential expression, gene set enrichment and candidate effector gene prediction analyses are available in the Zenodo data repository ( https://zenodo.org/ ) under accession number 11123248. The code used in this study is publicly available at https://github.com/CollinsLabBioComp/publication-islet_glucose_timecourse .
Asunto(s)
Perfilación de la Expresión Génica , Glucosa , Islotes Pancreáticos , Análisis de la Célula Individual , Humanos , Islotes Pancreáticos/metabolismo , Islotes Pancreáticos/efectos de los fármacos , Glucosa/farmacología , Glucosa/metabolismo , Transcriptoma , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Insulina/metabolismo , Células Secretoras de Insulina/metabolismo , Células Secretoras de Insulina/efectos de los fármacos , Hiperglucemia/genética , Hiperglucemia/metabolismoRESUMEN
Identifying the molecular mechanisms by which genome-wide association study (GWAS) loci influence traits remains challenging. Chromatin accessibility quantitative trait loci (caQTLs) help identify GWAS loci that may alter GWAS traits by modulating chromatin structure, but caQTLs have been identified in a limited set of human tissues. Here we mapped caQTLs in human liver tissue in 20 liver samples and identified 3,123 caQTLs. The caQTL variants are enriched in liver tissue promoter and enhancer states and frequently disrupt binding motifs of transcription factors expressed in liver. We predicted target genes for 861 caQTL peaks using proximity, chromatin interactions, correlation with promoter accessibility or gene expression, and colocalization with expression QTLs. Using GWAS signals for 19 liver function and/or cardiometabolic traits, we identified 110 colocalized caQTLs and GWAS signals, 56 of which contained a predicted caPeak target gene. At the LITAF LDL-cholesterol GWAS locus, we validated that a caQTL variant showed allelic differences in protein binding and transcriptional activity. These caQTLs contribute to the epigenomic characterization of human liver and help identify molecular mechanisms and genes at GWAS loci.
Asunto(s)
Cromatina/metabolismo , Hígado/metabolismo , Sitios de Carácter Cuantitativo , Secuencias de Aminoácidos , Sitios de Unión , Ensamble y Desensamble de Cromatina , Elementos de Facilitación Genéticos , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Regiones Promotoras Genéticas , Unión Proteica , Factores de Transcripción/química , Factores de Transcripción/metabolismo , TranscriptomaRESUMEN
Patients with classic hydroa vacciniforme-like lymphoproliferative disorder (HVLPD) typically have high levels of Epstein-Barr virus (EBV) DNA in T cells and/or natural killer (NK) cells in blood and skin lesions induced by sun exposure that are infiltrated with EBV-infected lymphocytes. HVLPD is very rare in the United States and Europe but more common in Asia and South America. The disease can progress to a systemic form that may result in fatal lymphoma. We report our 11-year experience with 16 HVLPD patients from the United States and England and found that whites were less likely to develop systemic EBV disease (1/10) than nonwhites (5/6). All (10/10) of the white patients were generally in good health at last follow-up, while two-thirds (4/6) of the nonwhite patients required hematopoietic stem cell transplantation. Nonwhite patients had later age of onset of HVLPD than white patients (median age, 8 vs 5 years) and higher levels of EBV DNA (median, 1 515 000 vs 250 000 copies/ml) and more often had low numbers of NK cells (83% vs 50% of patients) and T-cell clones in the blood (83% vs 30% of patients). RNA-sequencing analysis of an HVLPD skin lesion in a white patient compared with his normal skin showed increased expression of interferon-γ and chemokines that attract T cells and NK cells. Thus, white patients with HVLPD were less likely to have systemic disease with EBV and had a much better prognosis than nonwhite patients. This trial was registered at www.clinicaltrials.gov as #NCT00369421 and #NCT00032513.
Asunto(s)
Infecciones por Virus de Epstein-Barr/patología , Hidroa Vacciniforme/virología , Trastornos Linfoproliferativos/patología , Trastornos Linfoproliferativos/virología , Niño , Preescolar , Infecciones por Virus de Epstein-Barr/etnología , Infecciones por Virus de Epstein-Barr/inmunología , Femenino , Humanos , Trastornos Linfoproliferativos/etnología , Masculino , Población BlancaRESUMEN
Genome-wide association studies (GWAS) have identified >100 independent SNPs that modulate the risk of type 2 diabetes (T2D) and related traits. However, the pathogenic mechanisms of most of these SNPs remain elusive. Here, we examined genomic, epigenomic, and transcriptomic profiles in human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D. We first integrated genome and transcriptome variation across 112 islet samples to produce dense cis-expression quantitative trait loci (cis-eQTL) maps. Additional integration with chromatin-state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet-specific genes are specifically and significantly enriched in islet stretch enhancers. High-resolution chromatin accessibility profiling using assay for transposase-accessible chromatin sequencing (ATAC-seq) in two islet samples enabled us to identify specific transcription factor (TF) footprints embedded in active regulatory elements, which are highly enriched for islet cis-eQTL. Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality nature of the TF footprint predictions. Interestingly, we found that T2D GWAS loci were strikingly and specifically enriched in islet Regulatory Factor X (RFX) footprints. Remarkably, within and across independent loci, T2D risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high-information content positions. Together, these results suggest that common regulatory variations have shaped islet TF footprints and the transcriptome and that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad , Genoma Humano , Islotes Pancreáticos/metabolismo , Sitios de Carácter Cuantitativo , Transcriptoma , Alelos , Secuencia de Bases , Sitios de Unión , Cromatina/química , Cromatina/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patología , Epigénesis Genética , Perfilación de la Expresión Génica , Variación Genética , Estudio de Asociación del Genoma Completo , Impresión Genómica , Humanos , Islotes Pancreáticos/patología , Polimorfismo de Nucleótido Simple , Unión Proteica , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Factores de Transcripción del Factor Regulador X/genética , Factores de Transcripción del Factor Regulador X/metabolismoRESUMEN
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Asunto(s)
HDL-Colesterol/genética , Colesterol/genética , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Colesterol/metabolismo , HDL-Colesterol/metabolismo , Finlandia , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Desequilibrio de Ligamiento , Fenotipo , Grupos de Población , Población BlancaRESUMEN
Genome-wide association studies (GWAS) have identified ~100 loci associated with blood lipid levels, but much of the trait heritability remains unexplained, and at most loci the identities of the trait-influencing variants remain unknown. We conducted a trans-ethnic fine-mapping study at 18, 22, and 18 GWAS loci on the Metabochip for their association with triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C), respectively, in individuals of African American (n = 6,832), East Asian (n = 9,449), and European (n = 10,829) ancestry. We aimed to identify the variants with strongest association at each locus, identify additional and population-specific signals, refine association signals, and assess the relative significance of previously described functional variants. Among the 58 loci, 33 exhibited evidence of association at P<1 × 10(-4) in at least one ancestry group. Sequential conditional analyses revealed that ten, nine, and four loci in African Americans, Europeans, and East Asians, respectively, exhibited two or more signals. At these loci, accounting for all signals led to a 1.3- to 1.8-fold increase in the explained phenotypic variance compared to the strongest signals. Distinct signals across ancestry groups were identified at PCSK9 and APOA5. Trans-ethnic analyses narrowed the signals to smaller sets of variants at GCKR, PPP1R3B, ABO, LCAT, and ABCA1. Of 27 variants reported previously to have functional effects, 74% exhibited the strongest association at the respective signal. In conclusion, trans-ethnic high-density genotyping and analysis confirm the presence of allelic heterogeneity, allow the identification of population-specific variants, and limit the number of candidate SNPs for functional studies.
Asunto(s)
Apolipoproteínas A/genética , Estudio de Asociación del Genoma Completo , Proproteína Convertasas/genética , Serina Endopeptidasas/genética , Negro o Afroamericano/genética , Apolipoproteína A-V , HDL-Colesterol/sangre , HDL-Colesterol/genética , LDL-Colesterol/sangre , LDL-Colesterol/genética , Humanos , Lipoproteínas HDL/sangre , Lipoproteínas HDL/genética , Lipoproteínas LDL/sangre , Lipoproteínas LDL/genética , Proproteína Convertasa 9 , Triglicéridos/sangre , Triglicéridos/genética , Población Blanca/genéticaRESUMEN
Transgenic animals are extensively used to model human disease. Typically, the transgene copy number is estimated, but the exact integration site and configuration of the foreign DNA remains uncharacterized. When transgenes have been closely examined, some unexpected configurations have been found. Here, we describe a method to recover transgene insertion sites and assess structural rearrangements of host and transgene DNA using microarray hybridization and targeted sequence capture. We used information about the transgene insertion site to develop a polymerase chain reaction genotyping assay to distinguish heterozygous from homozygous transgenic animals. Although we worked with a bacterial artificial chromosome transgenic mouse line, this method can be used to analyse the integration site and configuration of any foreign DNA in a sequenced genome.
Asunto(s)
Técnicas de Genotipaje , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN , Transgenes , Animales , Cromosomas Artificiales Bacterianos , Ratones , Ratones Transgénicos , Reacción en Cadena de la PolimerasaRESUMEN
We developed an efficient CRISPR prime editing protocol and generated isogenic-induced pluripotent stem cell (iPSC) lines carrying heterozygous or homozygous alleles for putatively causal single nucleotide variants at six type 2 diabetes loci (ABCC8, MTNR1B, TCF7L2, HNF4A, CAMK1D, and GCK). Our two-step sequence-based approach to first identify transfected cell pools with the highest fraction of edited cells significantly reduced the downstream efforts to isolate single clones of edited cells. We found that prime editing can make targeted genetic changes in iPSC and optimization of system components and guide RNA designs that were critical to achieve acceptable efficiency. Systems utilizing PEmax, epegRNA modifications, and MLH1dn provided significant benefit, producing editing efficiencies of 36-73%. Editing success and pegRNA design optimization required for each variant differed depending on the sequence at the target site. With attention to design, prime editing is a promising approach to generate isogenic iPSC lines, enabling the study of specific genetic changes in a common genetic background.
Asunto(s)
Diabetes Mellitus Tipo 2 , Células Madre Pluripotentes Inducidas , Humanos , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas/genética , Sistemas CRISPR-Cas/genética , Edición Génica , ARN Guía de Sistemas CRISPR-CasRESUMEN
The hypothalamus, composed of several nuclei, is essential for maintaining our body's homeostasis. The arcuate nucleus (ARC), located in the mediobasal hypothalamus, contains neuronal populations with eminent roles in energy and glucose homeostasis as well as reproduction. These neuronal populations are of great interest for translational research. To fulfill this promise, we used a robotic cell culture platform to provide a scalable and chemically defined approach for differentiating human pluripotent stem cells (hPSCs) into pro-opiomelanocortin (POMC), somatostatin (SST), tyrosine hydroxylase (TH) and gonadotropin-releasing hormone (GnRH) neuronal subpopulations with an ARC-like signature. This robust approach is reproducible across several distinct hPSC lines and exhibits a stepwise induction of key ventral diencephalon and ARC markers in transcriptomic profiling experiments. This is further corroborated by direct comparison to human fetal hypothalamus, and the enriched expression of genes implicated in obesity and type 2 diabetes (T2D). Genome-wide chromatin accessibility profiling by ATAC-seq identified accessible regulatory regions that can be utilized to predict candidate enhancers related to metabolic disorders and hypothalamic development. In depth molecular, cellular, and functional experiments unveiled the responsiveness of the hPSC-derived hypothalamic neurons to hormonal stimuli, such as insulin, neuropeptides including kisspeptin, and incretin mimetic drugs such as Exendin-4, highlighting their potential utility as physiologically relevant cellular models for disease studies. In addition, differential glucose and insulin treatments uncovered adaptability within the generated ARC neurons in the dynamic regulation of POMC and insulin receptors. In summary, the establishment of this model represents a novel, chemically defined, and scalable platform for manufacturing large numbers of hypothalamic arcuate neurons and serves as a valuable resource for modeling metabolic and reproductive disorders.
RESUMEN
Massively parallel DNA sequencing technologies have greatly increased our ability to generate large amounts of sequencing data at a rapid pace. Several methods have been developed to enrich for genomic regions of interest for targeted sequencing. We have compared three of these methods: Molecular Inversion Probes (MIP), Solution Hybrid Selection (SHS), and Microarray-based Genomic Selection (MGS). Using HapMap DNA samples, we compared each of these methods with respect to their ability to capture an identical set of exons and evolutionarily conserved regions associated with 528 genes (2.61 Mb). For sequence analysis, we developed and used a novel Bayesian genotype-assigning algorithm, Most Probable Genotype (MPG). All three capture methods were effective, but sensitivities (percentage of targeted bases associated with high-quality genotypes) varied for an equivalent amount of pass-filtered sequence: for example, 70% (MIP), 84% (SHS), and 91% (MGS) for 400 Mb. In contrast, all methods yielded similar accuracies of >99.84% when compared to Infinium 1M SNP BeadChip-derived genotypes and >99.998% when compared to 30-fold coverage whole-genome shotgun sequencing data. We also observed a low false-positive rate with all three methods; of the heterozygous positions identified by each of the capture methods, >99.57% agreed with 1M SNP BeadChip, and >98.840% agreed with the whole-genome shotgun data. In addition, we successfully piloted the genomic enrichment of a set of 12 pooled samples via the MGS method using molecular bar codes. We find that these three genomic enrichment methods are highly accurate and practical, with sensitivities comparable to that of 30-fold coverage whole-genome shotgun data.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Genoma Humano , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Teorema de Bayes , ADN/genética , Sondas de ADN/genética , Exones , Genotipo , Humanos , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
Disruption of pancreatic islet function and glucose homeostasis can lead to the development of sustained hyperglycemia, beta cell glucotoxicity, and ultimately type 2 diabetes (T2D). In this study, we sought to explore the effects of hyperglycemia on human pancreatic islet (HPI) gene expression by exposing HPIs from two donors to low (2.8mM) and high (15.0mM) glucose concentrations over 24 hours, assaying the transcriptome at seven time points using single-cell RNA sequencing (scRNA-seq). We modeled time as both a discrete and continuous variable to determine momentary and longitudinal changes in transcription associated with islet time in culture or glucose exposure. Across all cell types, we identified 1,528 genes associated with time, 1,185 genes associated with glucose exposure, and 845 genes associated with interaction effects between time and glucose. We clustered differentially expressed genes across cell types and found 347 modules of genes with similar expression patterns across time and glucose conditions, including two beta cell modules enriched in genes associated with T2D. Finally, by integrating genomic features from this study and genetic summary statistics for T2D and related traits, we nominate 363 candidate effector genes that may underlie genetic associations for T2D and related traits.
RESUMEN
Genetic studies have identified numerous loci associated with type 2 diabetes (T2D), but the functional roles of many loci remain unexplored. Here, we engineered isogenic knockout human embryonic stem cell lines for 20 genes associated with T2D risk. We examined the impacts of each knockout on ß cell differentiation, functions, and survival. We generated gene expression and chromatin accessibility profiles on ß cells derived from each knockout line. Analyses of T2D-association signals overlapping HNF4A-dependent ATAC peaks identified a likely causal variant at the FAIM2 T2D-association signal. Additionally, the integrative association analyses identified four genes (CP, RNASE1, PCSK1N, and GSTA2) associated with insulin production, and two genes (TAGLN3 and DHRS2) associated with ß cell sensitivity to lipotoxicity. Finally, we leveraged deep ATAC-seq read coverage to assess allele-specific imbalance at variants heterozygous in the parental line and identified a single likely functional variant at each of 23 T2D-association signals.
Asunto(s)
Diabetes Mellitus Tipo 2 , Células Madre Embrionarias Humanas , Células Secretoras de Insulina , Humanos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Células Madre Embrionarias Humanas/metabolismo , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Células Secretoras de Insulina/metabolismo , Polimorfismo de Nucleótido Simple , Carbonil Reductasa (NADPH)/genética , Carbonil Reductasa (NADPH)/metabolismoRESUMEN
Genetic studies have identified numerous loci associated with type 2 diabetes (T2D), but the functional role of many loci has remained unexplored. In this study, we engineered isogenic knockout human embryonic stem cell (hESC) lines for 20 genes associated with T2D risk. We systematically examined ß-cell differentiation, insulin production and secretion, and survival. We performed RNA-seq and ATAC-seq on hESC-ß cells from each knockout line. Analyses of T2D GWAS signals overlapping with HNF4A-dependent ATAC peaks identified a specific SNP as a likely causal variant. In addition, we performed integrative association analyses and identified four genes ( CP, RNASE1, PCSK1N and GSTA2 ) associated with insulin production, and two genes ( TAGLN3 and DHRS2 ) associated with sensitivity to lipotoxicity. Finally, we leveraged deep ATAC-seq read coverage to assess allele-specific imbalance at variants heterozygous in the parental hESC line, to identify a single likely functional variant at each of 23 T2D GWAS signals.
RESUMEN
Complete characterization of the genetic effects on gene expression is needed to elucidate tissue biology and the etiology of complex traits. Here, we analyzed 2,344 subcutaneous adipose tissue samples and identified 34K conditionally distinct expression quantitative trait locus (eQTL) signals in 18K genes. Over half of eQTL genes exhibited at least two eQTL signals. Compared to primary signals, non-primary signals had lower effect sizes, lower minor allele frequencies, and less promoter enrichment; they corresponded to genes with higher heritability and higher tolerance for loss of function. Colocalization of eQTL with conditionally distinct genome-wide association study signals for 28 cardiometabolic traits identified 3,605 eQTL signals for 1,861 genes. Inclusion of non-primary eQTL signals increased colocalized signals by 46%. Among 30 genes with ≥2 pairs of colocalized signals, 21 showed a mediating gene dosage effect on the trait. Thus, expanded eQTL identification reveals more mechanisms underlying complex traits and improves understanding of the complexity of gene expression regulation.
RESUMEN
Hereditary congenital facial paresis type 1 (HCFP1) is an autosomal dominant disorder of absent or limited facial movement that maps to chromosome 3q21-q22 and is hypothesized to result from facial branchial motor neuron (FBMN) maldevelopment. In the present study, we report that HCFP1 results from heterozygous duplications within a neuron-specific GATA2 regulatory region that includes two enhancers and one silencer, and from noncoding single-nucleotide variants (SNVs) within the silencer. Some SNVs impair binding of NR2F1 to the silencer in vitro and in vivo and attenuate in vivo enhancer reporter expression in FBMNs. Gata2 and its effector Gata3 are essential for inner-ear efferent neuron (IEE) but not FBMN development. A humanized HCFP1 mouse model extends Gata2 expression, favors the formation of IEEs over FBMNs and is rescued by conditional loss of Gata3. These findings highlight the importance of temporal gene regulation in development and of noncoding variation in rare mendelian disease.