RESUMO
Sequence-specific DNA binding by transcription factors (TFs) is a crucial step in gene regulation. However, current high-throughput in vitro approaches cannot reliably detect lower affinity TF-DNA interactions, which play key roles in gene regulation. Here, we developed PADIT-seq ( p rotein a ffinity to D NA by in vitro transcription and RNA seq uencing) to assay TF binding preferences to all 10-bp DNA sequences at far greater sensitivity than prior approaches. The expanded catalogs of low affinity DNA binding sites for the human TFs HOXD13 and EGR1 revealed that nucleotides flanking high affinity DNA binding sites create overlapping lower affinity sites that together modulate TF genomic occupancy in vivo . Formation of such extended recognition sequences stems from an inherent property of TF binding sites to interweave each other and expands the genomic sequence space for identifying noncoding variants that directly alter TF binding. One-Sentence Summary: Overlapping DNA binding sites underlie TF genomic occupancy through their inherent propensity to interweave each other.
RESUMO
Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.
Assuntos
Sequenciamento de Cromatina por Imunoprecipitação/métodos , Aprendizado Profundo , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Célula Única/métodos , Células Cultivadas , Biologia Computacional , DNA/análise , DNA/genética , Humanos , Ilhotas Pancreáticas/citologia , Monócitos/citologiaRESUMO
Genome-wide association studies (GWAS) have linked single nucleotide polymorphisms (SNPs) at >250 loci in the human genome to type 2 diabetes (T2D) risk. For each locus, identifying the functional variant(s) among multiple SNPs in high linkage disequilibrium is critical to understand molecular mechanisms underlying T2D genetic risk. Using massively parallel reporter assays (MPRA), we test the cis-regulatory effects of SNPs associated with T2D and altered in vivo islet chromatin accessibility in MIN6 ß cells under steady state and pathophysiologic endoplasmic reticulum (ER) stress conditions. We identify 1,982/6,621 (29.9%) SNP-containing elements that activate transcription in MIN6 and 879 SNP alleles that modulate MPRA activity. Multiple T2D-associated SNPs alter the activity of short interspersed nuclear element (SINE)-containing elements that are strongly induced by ER stress. We identify 220 functional variants at 104 T2D association signals, narrowing 54 signals to a single candidate SNP. Together, this study identifies elements driving ß cell steady state and ER stress-responsive transcriptional activation, nominates causal T2D SNPs, and uncovers potential roles for repetitive elements in ß cell transcriptional stress response and T2D genetics.
Assuntos
Diabetes Mellitus Tipo 2/genética , Estresse do Retículo Endoplasmático/genética , Células Secretoras de Insulina/patologia , Polimorfismo de Nucleotídeo Único , Ativação Transcricional/genética , Alelos , Animais , Linhagem Celular , Cromatina/metabolismo , Diabetes Mellitus Tipo 2/patologia , Estudo de Associação Genômica Ampla , Humanos , Camundongos , Locos de Características Quantitativas , Elementos Nucleotídeos Curtos e Dispersos/genéticaRESUMO
Enhancers are cis-acting sequences that regulate transcription rates of their target genes in a cell-specific manner and harbor disease-associated sequence variants in cognate cell types. Many complex diseases are associated with enhancer malfunction, necessitating the discovery and study of enhancers from clinical samples. Assay for Transposase Accessible Chromatin (ATAC-seq) technology can interrogate chromatin accessibility from small cell numbers and facilitate studying enhancers in pathologies. However, on average, ~35% of open chromatin regions (OCRs) from ATAC-seq samples map to enhancers. We developed a neural network-based model, Predicting Enhancers from ATAC-Seq data (PEAS), to effectively infer enhancers from clinical ATAC-seq samples by extracting ATAC-seq data features and integrating these with sequence-related features (e.g., GC ratio). PEAS recapitulated ChromHMM-defined enhancers in CD14+ monocytes, CD4+ T cells, GM12878, peripheral blood mononuclear cells, and pancreatic islets. PEAS models trained on these 5 cell types effectively predicted enhancers in four cell types that are not used in model training (EndoC-ßH1, naïve CD8+ T, MCF7, and K562 cells). Finally, PEAS inferred individual-specific enhancers from 19 islet ATAC-seq samples and revealed variability in enhancer activity across individuals, including those driven by genetic differences. PEAS is an easy-to-use tool developed to study enhancers in pathologies by taking advantage of the increasing number of clinical epigenomes.
Assuntos
Sítios de Ligação , Elementos Facilitadores Genéticos , Redes Neurais de Computação , Transposases/metabolismo , Linhagem Celular , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Curva ROC , Sensibilidade e Especificidade , Análise de Sequência de DNA , Transcriptoma , Transposases/químicaRESUMO
Type 2 diabetes (T2D) is a complex disorder in which both genetic and environmental risk factors contribute to islet dysfunction and failure. Genome-wide association studies (GWAS) have linked single nucleotide polymorphisms (SNPs), most of which are noncoding, in >200 loci to islet dysfunction and T2D. Identification of the putative causal variants and their target genes and whether they lead to gain or loss of function remains challenging. Here, we profiled chromatin accessibility in pancreatic islet samples from 19 genotyped individuals and identified 2,949 SNPs associated with in vivo cis-regulatory element use (i.e., chromatin accessibility quantitative trait loci [caQTL]). Among the caQTLs tested (n = 13) using luciferase reporter assays in MIN6 ß-cells, more than half exhibited effects on enhancer activity that were consistent with in vivo chromatin accessibility changes. Importantly, islet caQTL analysis nominated putative causal SNPs in 13 T2D-associated GWAS loci, linking 7 and 6 T2D risk alleles, respectively, to gain or loss of in vivo chromatin accessibility. By investigating the effect of genetic variants on chromatin accessibility in islets, this study is an important step forward in translating T2D-associated GWAS SNP into functional molecular consequences.
Assuntos
Cromatina/metabolismo , Diabetes Mellitus Tipo 2/genética , Ilhotas Pancreáticas/metabolismo , Alelos , Cromatina/genética , Diabetes Mellitus Tipo 2/metabolismo , Predisposição Genética para Doença , Genótipo , HumanosRESUMO
Genome-wide association studies (GWASs) and functional genomics approaches implicate enhancer disruption in islet dysfunction and type 2 diabetes (T2D) risk. We applied genetic fine-mapping and functional (epi)genomic approaches to a T2D- and proinsulin-associated 15q22.2 locus to identify a most likely causal variant, determine its direction of effect, and elucidate plausible target genes. Fine-mapping and conditional analyses of proinsulin levels of 8,635 non-diabetic individuals from the METSIM study support a single association signal represented by a cluster of 16 strongly associated (p < 10-17) variants in high linkage disequilibrium (r2 > 0.8) with the GWAS index SNP rs7172432. These variants reside in an evolutionarily and functionally conserved islet and ß cell stretch or super enhancer; the most strongly associated variant (rs7163757, p = 3 × 10-19) overlaps a conserved islet open chromatin site. DNA sequence containing the rs7163757 risk allele displayed 2-fold higher enhancer activity than the non-risk allele in reporter assays (p < 0.01) and was differentially bound by ß cell nuclear extract proteins. Transcription factor NFAT specifically potentiated risk-allele enhancer activity and altered patterns of nuclear protein binding to the risk allele in vitro, suggesting that it could be a factor mediating risk-allele effects. Finally, the rs7163757 proinsulin-raising and T2D risk allele (C) was associated with increased expression of C2CD4B, and possibly C2CD4A, both of which were induced by inflammatory cytokines, in human islets. Together, these data suggest that rs7163757 contributes to genetic risk of islet dysfunction and T2D by increasing NFAT-mediated islet enhancer activity and modulating C2CD4B, and possibly C2CD4A, expression in (patho)physiologic states.
Assuntos
Proteínas de Ligação ao Cálcio/genética , Sequência Conservada , Elementos Facilitadores Genéticos/genética , Evolução Molecular , Ilhotas Pancreáticas/patologia , Mutação/genética , Proteínas Nucleares/genética , Fatores de Transcrição/genética , Idoso , Alelos , Animais , Sequência de Bases , Proteínas de Ligação ao Cálcio/metabolismo , Linhagem Celular , Cromatina/metabolismo , Cromossomos Humanos Par 15/genética , Citocinas/metabolismo , DNA Intergênico/genética , Humanos , Mediadores da Inflamação/metabolismo , Camundongos , Pessoa de Meia-Idade , Fatores de Transcrição NFATC/metabolismo , Mapeamento Físico do Cromossomo , Polimorfismo de Nucleotídeo Único/genética , Proinsulina/metabolismo , Ratos , Fatores de RiscoRESUMO
Pancreatic islet dysfunction and beta cell failure are hallmarks of type 2 diabetes mellitus (T2DM) pathogenesis. In this review, we discuss how genome-wide association studies (GWASs) and recent developments in islet (epi)genome and transcriptome profiling (particularly single cell analyses) are providing novel insights into the genetic, environmental, and cellular contributions to islet (dys)function and T2DM pathogenesis. Moving forward, study designs that interrogate and model genetic variation [e.g., allelic profiling and (epi)genome editing] will be critical to dissect the molecular genetics of T2DM pathogenesis, to build next-generation cellular and animal models, and to develop precision medicine approaches to detect, treat, and prevent islet (dys)function and T2DM.