RESUMO
Oct4, Sox2, Klf4, and cMyc (OSKM) reprogram somatic cells to pluripotency. To gain a mechanistic understanding of their function, we mapped OSKM-binding, stage-specific transcription factors (TFs), and chromatin states in discrete reprogramming stages and performed loss- and gain-of-function experiments. We found that OSK predominantly bind active somatic enhancers early in reprogramming and immediately initiate their inactivation genome-wide by inducing the redistribution of somatic TFs away from somatic enhancers to sites elsewhere engaged by OSK, recruiting Hdac1, and repressing the somatic TF Fra1. Pluripotency enhancer selection is a stepwise process that also begins early in reprogramming through collaborative binding of OSK at sites with high OSK-motif density. Most pluripotency enhancers are selected later in the process and require OS and other pluripotency TFs. Somatic and pluripotency TFs modulate reprogramming efficiency when overexpressed by altering OSK targeting, somatic-enhancer inactivation, and pluripotency enhancer selection. Together, our data indicate that collaborative interactions among OSK and with stage-specific TFs direct both somatic-enhancer inactivation and pluripotency-enhancer selection to drive reprogramming.
Assuntos
Reprogramação Celular , Fatores de Transcrição/metabolismo , Animais , Cromatina/metabolismo , Fibroblastos/metabolismo , Código das Histonas , Fator 4 Semelhante a Kruppel , Fatores de Transcrição Kruppel-Like/metabolismo , Camundongos , Fator 3 de Transcrição de Octâmero/metabolismo , Proteínas Proto-Oncogênicas c-fos/metabolismo , Proteínas Proto-Oncogênicas c-myc/metabolismo , Elementos Reguladores de Transcrição , Fatores de Transcrição SOXB1/metabolismo , Elementos Silenciadores TranscricionaisRESUMO
Hundreds of chromatin regulators (CRs) control chromatin structure and function by catalyzing and binding histone modifications, yet the rules governing these key processes remain obscure. Here, we present a systematic approach to infer CR function. We developed ChIP-string, a meso-scale assay that combines chromatin immunoprecipitation with a signature readout of 487 representative loci. We applied ChIP-string to screen 145 antibodies, thereby identifying effective reagents, which we used to map the genome-wide binding of 29 CRs in two cell types. We found that specific combinations of CRs colocalize in characteristic patterns at distinct chromatin environments, at genes of coherent functions, and at distal regulatory elements. When comparing between cell types, CRs redistribute to different loci but maintain their modular and combinatorial associations. Our work provides a multiplex method that substantially enhances the ability to monitor CR binding, presents a large resource of CR maps, and reveals common principles for combinatorial CR function.
Assuntos
Imunoprecipitação da Cromatina/métodos , Cromatina/metabolismo , Genômica/métodos , Código das Histonas , Cromatina/química , Montagem e Desmontagem da Cromatina , Células-Tronco Embrionárias , Genoma , Humanos , Células K562RESUMO
MOTIVATION: Genome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution. RESULTS: We developed CSREP, which takes as input chromatin state annotations for a group of samples. CSREP then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers that predict the chromatin state assignment of each sample given the state maps from all other samples. The difference in CSREP's probability assignments for the two groups can be used to identify genomic locations with differential chromatin state assignments. Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution. AVAILABILITY AND IMPLEMENTATION: The CSREP source code and generated data are available at http://github.com/ernstlab/csrep. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Cromatina , Genômica , Cromatina/genética , Genômica/métodos , Genoma , Software , Mapeamento CromossômicoRESUMO
Three-dimensional physical interactions within chromosomes dynamically regulate gene expression in a tissue-specific manner. However, the 3D organization of chromosomes during human brain development and its role in regulating gene networks dysregulated in neurodevelopmental disorders, such as autism or schizophrenia, are unknown. Here we generate high-resolution 3D maps of chromatin contacts during human corticogenesis, permitting large-scale annotation of previously uncharacterized regulatory relationships relevant to the evolution of human cognition and disease. Our analyses identify hundreds of genes that physically interact with enhancers gained on the human lineage, many of which are under purifying selection and associated with human cognitive function. We integrate chromatin contacts with non-coding variants identified in schizophrenia genome-wide association studies (GWAS), highlighting multiple candidate schizophrenia risk genes and pathways, including transcription factors involved in neurogenesis, and cholinergic signalling molecules, several of which are supported by independent expression quantitative trait loci and gene expression analyses. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene. This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders.
Assuntos
Encéfalo/embriologia , Encéfalo/metabolismo , Cromatina/química , Cromatina/genética , Cromossomos Humanos/química , Cromossomos Humanos/genética , Regulação da Expressão Gênica no Desenvolvimento , Conformação de Ácido Nucleico , Cromatina/metabolismo , Cromossomos Humanos/metabolismo , Cognição , Elementos Facilitadores Genéticos/genética , Epigênese Genética , Fatores de Transcrição Forkhead/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Proteínas do Tecido Nervoso/genética , Células-Tronco Neurais/metabolismo , Neurogênese , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único/genética , Regiões Promotoras Genéticas/genética , Reprodutibilidade dos Testes , Esquizofrenia/genética , Esquizofrenia/patologiaRESUMO
Many disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. Previous studies have found enrichment of expression quantitative trait loci (eQTLs) in disease risk loci, indicating that identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allelic imbalance (AIM) that measures imbalance in gene expression on two chromosomes of a diploid organism. In this work, we develop a novel statistical method that leverages both AIM and total expression data to detect causal variants that regulate gene expression. We illustrate through simulations and application to 10 tissues of the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. Across all tissues and genes, our method achieves a median reduction rate of 11% in the number of putative causal variants. We use chromatin state data from the Roadmap Epigenomics Consortium to show that the putative causal variants identified by our method are enriched for active regions of the genome, providing orthogonal support that our method identifies causal variants with increased specificity.
Assuntos
Desequilíbrio Alélico , Cromatina/genética , Mapeamento Cromossômico/métodos , Locos de Características Quantitativas , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Herança Multifatorial , Polimorfismo de Nucleotídeo ÚnicoRESUMO
MOTIVATION: Chromatin interactions play an important role in genome architecture and gene regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g. 5-25 kb), which is substantially coarser than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions. RESULTS: To predict the sources of Hi-C-identified interactions at a high resolution (e.g. 100 bp), we developed a computational method that integrates data from DNase-seq and ChIP-seq of TFs and histone marks. Our method, χ-CNN, uses this data to first train a convolutional neural network (CNN) to discriminate between called Hi-C interactions and non-interactions. χ-CNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also show χ-CNN predictions enrich for evolutionarily conserved bases, eQTLs and CTCF motifs, supporting their biological significance. χ-CNN provides an approach for analyzing important aspects of genome architecture and gene regulation at a higher resolution than previously possible. AVAILABILITY AND IMPLEMENTATION: χ-CNN software is available on GitHub (https://github.com/ernstlab/X-CNN). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Cromatina , Genoma , Código das Histonas , Redes Neurais de Computação , SoftwareRESUMO
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
Assuntos
Epigênese Genética/genética , Epigenômica , Genoma Humano/genética , Sequência de Bases , Linhagem da Célula/genética , Células Cultivadas , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , Cromossomos Humanos/química , Cromossomos Humanos/genética , Cromossomos Humanos/metabolismo , DNA/química , DNA/genética , DNA/metabolismo , Metilação de DNA , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla , Histonas/metabolismo , Humanos , Especificidade de Órgãos/genética , RNA/genética , Valores de ReferênciaRESUMO
MOTIVATION: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. RESULTS: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. CONTACT: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu.
Assuntos
Genômica , Variação Genética , Genótipo , Polimorfismo de Nucleotídeo Único , Locos de Características QuantitativasRESUMO
Chromatin profiling has emerged as a powerful means of genome annotation and detection of regulatory activity. The approach is especially well suited to the characterization of non-coding portions of the genome, which critically contribute to cellular phenotypes yet remain largely uncharted. Here we map nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions. Focusing on cell-type-specific patterns of promoters and enhancers, we define multicell activity profiles for chromatin state, gene expression, regulatory motif enrichment and regulator expression. We use correlations between these profiles to link enhancers to putative target genes, and predict the cell-type-specific activators and repressors that modulate them. The resulting annotations and regulatory predictions have implications for the interpretation of genome-wide association studies. Top-scoring disease single nucleotide polymorphisms are frequently positioned within enhancer elements specifically active in relevant cell types, and in some cases affect a motif instance for a predicted regulator, thus suggesting a mechanism for the association. Our study presents a general framework for deciphering cis-regulatory connections and their roles in disease.
Assuntos
Fenômenos Fisiológicos Celulares , Cromatina/genética , Cromatina/metabolismo , Mapeamento Cromossômico , Sítios de Ligação , Linhagem Celular , Linhagem Celular Tumoral , Células Cultivadas , Regulação da Expressão Gênica , Genoma Humano/genética , Células Hep G2 , Humanos , Regiões Promotoras Genéticas/genética , Reprodutibilidade dos Testes , Fatores de Transcrição/genéticaRESUMO
Chromatin is composed of DNA and a variety of modified histones and non-histone proteins, which have an impact on cell differentiation, gene regulation and other key cellular processes. Here we present a genome-wide chromatin landscape for Drosophila melanogaster based on eighteen histone modifications, summarized by nine prevalent combinatorial patterns. Integrative analysis with other data (non-histone chromatin proteins, DNase I hypersensitivity, GRO-Seq reads produced by engaged polymerase, short/long RNA products) reveals discrete characteristics of chromosomes, genes, regulatory elements and other functional domains. We find that active genes display distinct chromatin signatures that are correlated with disparate gene lengths, exon patterns, regulatory functions and genomic contexts. We also demonstrate a diversity of signatures among Polycomb targets that include a subset with paused polymerase. This systematic profiling and integrative analysis of chromatin signatures provides insights into how genomic elements are regulated, and will serve as a resource for future experimental investigations of genome structure and function.
Assuntos
Cromatina/genética , Cromatina/metabolismo , Drosophila melanogaster/genética , Animais , Linhagem Celular , Imunoprecipitação da Cromatina , Proteínas Cromossômicas não Histona/análise , Proteínas Cromossômicas não Histona/metabolismo , Desoxirribonuclease I/metabolismo , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Drosophila melanogaster/crescimento & desenvolvimento , Éxons/genética , Regulação da Expressão Gênica/genética , Genes de Insetos/genética , Genoma de Inseto/genética , Histonas/química , Histonas/metabolismo , Masculino , Anotação de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Complexo Repressor Polycomb 1 , RNA/análise , RNA/genética , Análise de Sequência , Transcrição Gênica/genéticaRESUMO
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering â¼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for â¼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.
Assuntos
Evolução Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animais , Doença , Éxons/genética , Genômica , Saúde , Humanos , Anotação de Sequência Molecular , Filogenia , RNA/classificação , RNA/genética , Seleção Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNARESUMO
Pancreatic islet failure is a characteristic feature of impaired glucose control in diabetes mellitus. Circadian control of islet function is essential for maintaining proper glucose homeostasis. Circadian variations in transcriptional pathways have been described in diverse cell types and shown to be critical for optimization of cellular function in vivo. In the current study, we utilized Short Time Series Expression Miner (STEM) analysis to identify diurnally expressed transcripts and biological pathways from mouse islets isolated at 4 h intervals throughout the 24 h light-dark cycle. STEM analysis identified 19 distinct chronological model profiles, and genes belonging to each profile were subsequently annotated to significantly enriched Kyoto Encyclopedia of Genes and Genomes biological pathways. Several transcriptional pathways essential for proper islet function (e.g., insulin secretion, oxidative phosphorylation), cell survival (e.g., insulin signaling, apoptosis) and cell proliferation (DNA replication, homologous recombination) demonstrated significant time-dependent variations. Notably, KEGG pathway analysis revealed "protein processing in endoplasmic reticulum - mmu04141" as one of the most enriched time-dependent pathways in islets. This study provides unique data set on time-dependent diurnal profiles of islet gene expression and biological pathways, and suggests that diurnal variation of the islet transcriptome is an important feature of islet homeostasis, which should be taken into consideration for optimal experimental design and interpretation of future islet studies.
Assuntos
Relógios Circadianos/fisiologia , Ritmo Circadiano/fisiologia , Ilhotas Pancreáticas/fisiologia , Transcriptoma/fisiologia , Animais , Glicemia/metabolismo , Proliferação de Células/fisiologia , Expressão Gênica/fisiologia , Homeostase/fisiologia , Ilhotas Pancreáticas/metabolismo , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Transdução de Sinais/fisiologiaRESUMO
The regions bound by sequence-specific transcription factors can be highly variable across different cell types despite the static nature of the underlying genome sequence. This has been partly attributed to changes in chromatin accessibility, but a systematic picture has been hindered by the lack of large-scale data sets. Here, we use 456 binding experiments for 119 regulators and 84 chromatin maps generated by the ENCODE in six human cell types, and relate those to a global map of regulatory motif instances for these factors. We find specific and robust chromatin state preferences for each regulator beyond the previously reported open-chromatin association, suggesting a much richer chromatin landscape beyond simple accessibility. The preferentially bound chromatin states of regulators were enriched for sequence motifs of regulators relative to all states, suggesting that these preferences are at least partly encoded by the genomic sequence. Relative to all regions bound by a regulator, however, regulatory motifs were surprisingly depleted in the regulator's preferentially bound states, suggesting additional non-sequence-specific binding beyond the level predicted by the regulatory motifs. Such permissive binding was largely restricted to open-chromatin regions showing histone modification marks characteristic of active enhancer and promoter regions, whereas open-chromatin regions lacking such marks did not show permissive binding. Lastly, the vast majority of cobinding of regulator pairs is predicted by the chromatin state preferences of individual regulators. Overall, our results suggest a joint role of sequence motifs and specific chromatin states beyond mere accessibility in mediating regulator binding dynamics across different cell types.
Assuntos
Cromatina/metabolismo , Motivos de Nucleotídeos , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Sítios de Ligação , Linhagem Celular , Montagem e Desmontagem da Cromatina , Análise por Conglomerados , Elementos Facilitadores Genéticos , Humanos , Especificidade de Órgãos/genética , Regiões Promotoras Genéticas , Ligação ProteicaRESUMO
Genome-wide chromatin annotations have permitted the mapping of putative regulatory elements across multiple human cell types. However, their experimental dissection by directed regulatory motif disruption has remained unfeasible at the genome scale. Here, we use a massively parallel reporter assay (MPRA) to measure the transcriptional levels induced by 145-bp DNA segments centered on evolutionarily conserved regulatory motif instances within enhancer chromatin states. We select five predicted activators (HNF1, HNF4, FOXA, GATA, NFE2L2) and two predicted repressors (GFI1, ZFP161) and measure reporter expression in erythroleukemia (K562) and liver carcinoma (HepG2) cell lines. We test 2104 wild-type sequences and 3314 engineered enhancer variants containing targeted motif disruptions, each using 10 barcode tags and two replicates. The resulting data strongly confirm the enhancer activity and cell-type specificity of enhancer chromatin states, the ability of 145-bp segments to recapitulate both, the necessary role of regulatory motifs in enhancer function, and the complementary roles of activator and repressor motifs. We find statistically robust evidence that (1) disrupting the predicted activator motifs abolishes enhancer function, while silent or motif-improving changes maintain enhancer activity; (2) evolutionary conservation, nucleosome exclusion, binding of other factors, and strength of the motif match are predictive of enhancer activity; (3) scrambling repressor motifs leads to aberrant reporter expression in cell lines where the enhancers are usually inactive. Our results suggest a general strategy for deciphering cis-regulatory elements by systematic large-scale manipulation and provide quantitative enhancer activity measurements across thousands of constructs that can be mined to develop predictive models of gene expression.
Assuntos
Cromatina/genética , Elementos Facilitadores Genéticos , Motivos de Nucleotídeos/genética , Transcrição Gênica , Sequência de Bases , Sítios de Ligação , Células/classificação , Células/metabolismo , Mapeamento Cromossômico , Sequência Conservada , Regulação da Expressão Gênica , Genes Reporter , Genoma Humano , Células Hep G2 , Humanos , Regiões Promotoras GenéticasRESUMO
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity.
Assuntos
Cromatina/genética , Cromatina/metabolismo , Histonas/genética , Histonas/metabolismo , Fatores de Transcrição/genética , Algoritmos , Linhagem Celular , Mapeamento Cromossômico , Biologia Computacional , Mineração de Dados , Ontologia Genética , Células Endoteliais da Veia Umbilical Humana , Humanos , Células K562 , Regiões Promotoras Genéticas , Interface Usuário-ComputadorRESUMO
The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.
Assuntos
Cromatina/química , Genoma Humano , Anotação de Sequência Molecular , Elementos Reguladores de Transcrição , Elementos Facilitadores Genéticos , Estudo de Associação Genômica Ampla , Humanos , Elementos Isolantes , Regiões Promotoras Genéticas , Proteínas/genética , Regiões Terminadoras Genéticas , Transcrição GênicaRESUMO
Optic nerve degeneration caused by glaucoma is a leading cause of blindness worldwide. Patients affected by the normal-pressure form of glaucoma are more likely to harbor risk alleles for glaucoma-related optic nerve disease. We have performed a meta-analysis of two independent genome-wide association studies for primary open angle glaucoma (POAG) followed by a normal-pressure glaucoma (NPG, defined by intraocular pressure (IOP) less than 22 mmHg) subgroup analysis. The single-nucleotide polymorphisms that showed the most significant associations were tested for association with a second form of glaucoma, exfoliation-syndrome glaucoma. The overall meta-analysis of the GLAUGEN and NEIGHBOR dataset results (3,146 cases and 3,487 controls) identified significant associations between two loci and POAG: the CDKN2BAS region on 9p21 (rs2157719 [G], ORâ=â0.69 [95%CI 0.63-0.75], pâ=â1.86×10⻹8), and the SIX1/SIX6 region on chromosome 14q23 (rs10483727 [A], ORâ=â1.32 [95%CI 1.21-1.43], pâ=â3.87×10⻹¹). In sub-group analysis two loci were significantly associated with NPG: 9p21 containing the CDKN2BAS gene (rs2157719 [G], ORâ=â0.58 [95% CI 0.50-0.67], pâ=â1.17×10⻹²) and a probable regulatory region on 8q22 (rs284489 [G], ORâ=â0.62 [95% CI 0.53-0.72], pâ=â8.88×10⻹°). Both NPG loci were also nominally associated with a second type of glaucoma, exfoliation syndrome glaucoma (rs2157719 [G], ORâ=â0.59 [95% CI 0.41-0.87], pâ=â0.004 and rs284489 [G], ORâ=â0.76 [95% CI 0.54-1.06], pâ=â0.021), suggesting that these loci might contribute more generally to optic nerve degeneration in glaucoma. Because both loci influence transforming growth factor beta (TGF-beta) signaling, we performed a genomic pathway analysis that showed an association between the TGF-beta pathway and NPG (permuted pâ=â0.009). These results suggest that neuro-protective therapies targeting TGF-beta signaling could be effective for multiple forms of glaucoma.
Assuntos
Síndrome de Exfoliação/genética , Estudo de Associação Genômica Ampla , Glaucoma de Ângulo Aberto/genética , Degeneração Neural , Fator de Crescimento Transformador beta , Alelos , Cromossomos Humanos Par 8 , Cromossomos Humanos Par 9 , Proteínas de Homeodomínio/genética , Humanos , Degeneração Neural/genética , Degeneração Neural/patologia , Nervo Óptico/patologia , Polimorfismo de Nucleotídeo Único , RNA Longo não Codificante , RNA não Traduzido/genética , Fator de Crescimento Transformador beta/genética , Fator de Crescimento Transformador beta/metabolismoRESUMO
Interplays among lineage-specific nuclear proteins, chromatin modifying enzymes, and the basal transcription machinery govern cellular differentiation, but their dynamics of action and coordination with transcriptional control are not fully understood. Alterations in chromatin structure appear to establish a permissive state for gene activation at some loci, but they play an integral role in activation at other loci. To determine the predominant roles of chromatin states and factor occupancy in directing gene regulation during differentiation, we mapped chromatin accessibility, histone modifications, and nuclear factor occupancy genome-wide during mouse erythroid differentiation dependent on the master regulatory transcription factor GATA1. Notably, despite extensive changes in gene expression, the chromatin state profiles (proportions of a gene in a chromatin state dominated by activating or repressive histone modifications) and accessibility remain largely unchanged during GATA1-induced erythroid differentiation. In contrast, gene induction and repression are strongly associated with changes in patterns of transcription factor occupancy. Our results indicate that during erythroid differentiation, the broad features of chromatin states are established at the stage of lineage commitment, largely independently of GATA1. These determine permissiveness for expression, with subsequent induction or repression mediated by distinctive combinations of transcription factors.
Assuntos
Diferenciação Celular/genética , Epigênese Genética , Eritropoese/genética , Fator de Transcrição GATA1/metabolismo , Animais , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Linhagem Celular , Montagem e Desmontagem da Cromatina , Imunoprecipitação da Cromatina , Estradiol/farmacologia , Estradiol/fisiologia , Fator de Transcrição GATA1/genética , Fator de Transcrição GATA2/metabolismo , Perfilação da Expressão Gênica , Inativação Gênica , Camundongos , Análise Multivariada , Peptídeo Hidrolases/metabolismo , Ligação Proteica , Proteínas Proto-Oncogênicas/metabolismo , Receptores de Estrogênio/genética , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Sequências Reguladoras de Ácido Nucleico , Proteína 1 de Leucemia Linfocítica Aguda de Células TRESUMO
Transcriptional enhancers play critical roles in regulation of gene expression, but their identification in the eukaryotic genome has been challenging. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of cell types or chromatin marks have previously been investigated for this purpose, leaving the question unanswered whether there exists an optimal set of histone modifications for enhancer prediction in different cell types. Here, we address this issue by exploring genome-wide profiles of 24 histone modifications in two distinct human cell types, embryonic stem cells and lung fibroblasts. We developed a Random-Forest based algorithm, RFECS (Random Forest based Enhancer identification from Chromatin States) to integrate histone modification profiles for identification of enhancers, and used it to identify enhancers in a number of cell-types. We show that RFECS not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify the most informative and robust set of three chromatin marks for enhancer prediction.
Assuntos
Algoritmos , Cromatina/genética , Biologia Computacional/métodos , Elementos Facilitadores Genéticos/genética , Área Sob a Curva , Sítios de Ligação , Linhagem Celular , Cromatina/química , Cromatina/metabolismo , Análise por Conglomerados , Bases de Dados Genéticas , Árvores de Decisões , Histonas/genética , Histonas/metabolismo , Humanos , Reprodutibilidade dos Testes , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Whole-genome sequencing (WGS) data is facilitating genome-wide identification of rare noncoding variants, while elucidating their roles in disease remains challenging. Towards this end, we first revisit a reported significant brain-related association signal of autism spectrum disorder (ASD) detected from de novo noncoding variants attributed to deep-learning and show that local GC content can capture similar association signals. We further show that the association signal appears driven by variants from male proband-female sibling pairs that are upstream of assigned genes. We then develop Expression Neighborhood Sequence Association Study (ENSAS), which utilizes gene expression correlations and sequence information, to more systematically identify phenotype-associated variant sets. Applying ENSAS to the same set of de novo variants, we identify gene expression-based neighborhoods showing significant ASD association signal, enriched for synapse-related gene ontology terms. For these top neighborhoods, we also identify chromatin states annotations of variants that are predictive of the proband-sibling local GC content differences. Our work provides new insights into associations of non-coding de novo mutations in ASD and presents an analytical framework applicable to other phenotypes.