Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 76
Filter
1.
Cell ; 180(6): 1262-1271.e15, 2020 03 19.
Article in English | MEDLINE | ID: mdl-32169219

ABSTRACT

Establishing causal links between non-coding variants and human phenotypes is an increasing challenge. Here, we introduce a high-throughput mouse reporter assay for assessing the pathogenic potential of human enhancer variants in vivo and examine nearly a thousand variants in an enhancer repeatedly linked to polydactyly. We show that 71% of all rare non-coding variants previously proposed as causal lead to reporter gene expression in a pattern consistent with their pathogenic role. Variants observed to alter enhancer activity were further confirmed to cause polydactyly in knockin mice. We also used combinatorial and single-nucleotide mutagenesis to evaluate the in vivo impact of mutations affecting all positions of the enhancer and identified additional functional substitutions, including potentially pathogenic variants hitherto not observed in humans. Our results uncover the functional consequences of hundreds of mutations in a phenotype-associated enhancer and establish a widely applicable strategy for systematic in vivo evaluation of human enhancer variants.


Subject(s)
Enhancer Elements, Genetic/genetics , High-Throughput Screening Assays/methods , Polydactyly/genetics , Animals , Enhancer Elements, Genetic/physiology , Gene Expression Regulation, Developmental/genetics , Gene Knock-In Techniques/methods , Hedgehog Proteins/genetics , Hedgehog Proteins/metabolism , Humans , Mice , Mutation , Phenotype , Polydactyly/metabolism , RNA, Untranslated/genetics
2.
Cell ; 152(4): 895-908, 2013 Feb 14.
Article in English | MEDLINE | ID: mdl-23375746

ABSTRACT

The mammalian telencephalon plays critical roles in cognition, motor function, and emotion. Though many of the genes required for its development have been identified, the distant-acting regulatory sequences orchestrating their in vivo expression are mostly unknown. Here, we describe a digital atlas of in vivo enhancers active in subregions of the developing telencephalon. We identified more than 4,600 candidate embryonic forebrain enhancers and studied the in vivo activity of 329 of these sequences in transgenic mouse embryos. We generated serial sets of histological brain sections for 145 reproducible forebrain enhancers, resulting in a publicly accessible web-based data collection comprising more than 32,000 sections. We also used epigenomic analysis of human and mouse cortex tissue to directly compare the genome-wide enhancer architecture in these species. These data provide a primary resource for investigating gene regulatory mechanisms of telencephalon development and enable studies of the role of distant-acting enhancers in neurodevelopmental disorders.


Subject(s)
Enhancer Elements, Genetic , Telencephalon/metabolism , Animals , Embryo, Mammalian/metabolism , Fetus/metabolism , Genome-Wide Association Study , Humans , Mice , Telencephalon/embryology , Transcriptome , p300-CBP Transcription Factors/metabolism
3.
Proc Natl Acad Sci U S A ; 120(35): e2206612120, 2023 08 29.
Article in English | MEDLINE | ID: mdl-37603758

ABSTRACT

Genetic association studies have identified hundreds of independent signals associated with type 2 diabetes (T2D) and related traits. Despite these successes, the identification of specific causal variants underlying a genetic association signal remains challenging. In this study, we describe a deep learning (DL) method to analyze the impact of sequence variants on enhancers. Focusing on pancreatic islets, a T2D relevant tissue, we show that our model learns islet-specific transcription factor (TF) regulatory patterns and can be used to prioritize candidate causal variants. At 101 genetic signals associated with T2D and related glycemic traits where multiple variants occur in linkage disequilibrium, our method nominates a single causal variant for each association signal, including three variants previously shown to alter reporter activity in islet-relevant cell types. For another signal associated with blood glucose levels, we biochemically test all candidate causal variants from statistical fine-mapping using a pancreatic islet beta cell line and show biochemical evidence of allelic effects on TF binding for the model-prioritized variant. To aid in future research, we publicly distribute our model and islet enhancer perturbation scores across ~67 million genetic variants. We anticipate that DL methods like the one presented in this study will enhance the prioritization of candidate causal variants for functional studies.


Subject(s)
Deep Learning , Diabetes Mellitus, Type 2 , Enhancer Elements, Genetic , Islets of Langerhans , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Diabetes Mellitus, Type 2/pathology , Islets of Langerhans/metabolism , Islets of Langerhans/pathology , Genetic Variation , Humans , Computer Simulation
4.
Genome Res ; 32(3): 437-448, 2022 03.
Article in English | MEDLINE | ID: mdl-35105669

ABSTRACT

Dual-function regulatory elements (REs), acting as enhancers in some cellular contexts and as silencers in others, have been reported to facilitate the precise gene regulatory response to developmental signals in Drosophila melanogaster However, with few isolated examples detected, dual-function REs in mammals have yet to be systematically studied. We herein investigated this class of REs in the human genome and profiled their activity across multiple cell types. Focusing on enhancer-silencer transitions specific to the development of T cells, we built an accurate deep learning classifier of REs and identified about 12,000 silencers active in primary peripheral blood T cells that act as enhancers in embryonic stem cells. Compared with regular silencers, these dual-function REs are evolving under stronger purifying selection and are enriched for mutations associated with disease phenotypes and altered gene expression. In addition, they are enriched in the loci of transcriptional regulators, such as transcription factors (TFs) and chromatin remodeling genes. Dual-function REs consist of two intertwined but largely distinct sets of binding sites bound by either activating or repressing TFs, depending on the type of RE function in a given cell line. This indicates the recruitment of different TFs for different regulatory modes and a complex DNA sequence composition of these REs with dual activating and repressive encoding. With an estimated >6% of cell type-specific human silencers acting as dual-function REs, this overlooked class of REs requires a specific investigation on how their inherent functional plasticity might be a contributing factor to human diseases.


Subject(s)
Enhancer Elements, Genetic , Genome, Human , Animals , Drosophila melanogaster/genetics , Gene Expression Regulation , Humans , Transcription Factors/genetics , Transcription Factors/metabolism
5.
Bioinformatics ; 39(39 Suppl 1): i377-i385, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387183

ABSTRACT

MOTIVATION: Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization algorithms, faster GPU speeds, and more intricate machine-learning libraries, hybrid convolutional and recurrent neural network architectures can be constructed and applied to extract crucial information from non-coding DNA. RESULTS: Using a comparative analysis of the performance of thousands of Deep Learning architectures, we developed ChromDL, a neural network architecture combining bidirectional gated recurrent units, convolutional neural networks, and bidirectional long short-term memory units, which significantly improves upon a range of prediction metrics compared to its predecessors in transcription factor binding site, histone modification, and DNase-I hyper-sensitive site detection. Combined with a secondary model, it can be utilized for accurate classification of gene regulatory elements. The model can also detect weak transcription factor binding as compared to previously developed methods and has the potential to help delineate transcription factor binding motif specificities. AVAILABILITY AND IMPLEMENTATION: The ChromDL source code can be found at https://github.com/chrishil1/ChromDL.


Subject(s)
Algorithms , Benchmarking , DNA , Deoxyribonuclease I , Transcription Factors
6.
Nucleic Acids Res ; 49(8): 4493-4505, 2021 05 07.
Article in English | MEDLINE | ID: mdl-33872375

ABSTRACT

An essential questions of gene regulation is how large number of enhancers and promoters organize into gene regulatory loops. Using transcription-factor binding enrichment as an indicator of enhancer strength, we identified a portion of H3K27ac peaks as potentially strong enhancers and found a universal pattern of promoter and enhancer distribution: At actively transcribed regions of length of ∼200-300 kb, the numbers of active promoters and enhancers are inversely related. Enhancer clusters are associated with isolated active promoters, regardless of the gene's cell-type specificity. As the number of nearby active promoters increases, the number of enhancers decreases. At regions where multiple active genes are closely located, there are few distant enhancers. With Hi-C analysis, we demonstrate that the interactions among the regulatory elements (active promoters and enhancers) occur predominantly in clusters and multiway among linearly close elements and the distance between adjacent elements shows a preference of ∼30 kb. We propose a simple rule of spatial organization of active promoters and enhancers: Gene transcriptions and regulations mainly occur at local active transcription hubs contributed dynamically by multiple elements from linearly close enhancers and/or active promoters. The hub model can be represented with a flower-shaped structure and implies an enhancer-like role of active promoters.


Subject(s)
Chromosomes/metabolism , Enhancer Elements, Genetic , Gene Expression Regulation/genetics , Histones/metabolism , Promoter Regions, Genetic , Acetylation , Chromatin Immunoprecipitation Sequencing , Chromosomes/genetics , Databases, Genetic , Genome, Human , Humans , Models, Genetic , Multigene Family , Murine hepatitis virus , RNA-Seq , Transcriptional Activation/genetics
7.
Genome Res ; 29(4): 657-667, 2019 04.
Article in English | MEDLINE | ID: mdl-30886051

ABSTRACT

Compared to enhancers, silencers are notably difficult to identify and validate experimentally. In search for human silencers, we utilized H3K27me3-DNase I hypersensitive site (DHS) peaks with tissue specificity negatively correlated with the expression of nearby genes across 25 diverse cell lines. These regions are predicted to be silencers since they are physically linked, using Hi-C loops, or associated, using expression quantitative trait loci (eQTL) results, with a decrease in gene expression much more frequently than general H3K27me3-DHSs. Also, these regions are enriched for the binding sites of transcriptional repressors (such as CTCF, MECOM, SMAD4, and SNAI3) and depleted of the binding sites of transcriptional activators. Using sequence signatures of these regions, we constructed a computational model and predicted approximately 10,000 additional silencers per cell line and demonstrated that the majority of genes linked to these silencers are expressed at a decreased level. Furthermore, single nucleotide polymorphisms (SNPs) in predicted silencers are significantly associated with disease phenotypes. Finally, our results show that silencers commonly interact with enhancers to affect the transcriptional dynamics of tissue-specific genes and to facilitate fine-tuning of transcription in the human genome.


Subject(s)
Epigenesis, Genetic , Silencer Elements, Transcriptional , Transcriptome , Cell Line , Genetic Predisposition to Disease , Histones/metabolism , Humans , Phenotype , Polymorphism, Single Nucleotide , Transcription Factors/metabolism
8.
Genomics ; 112(3): 2261-2270, 2020 05.
Article in English | MEDLINE | ID: mdl-31887344

ABSTRACT

An increasing number of studies suggest that functionally redundant enhancers safeguard development via buffering gene expression against environmental and genetic perturbations. Here, we identified over-represented clusters of enhancers (enhancer jungles or EJs) in human B lymphoblastoid cells. We found that EJs tend to associate with genes involved in the activation of the immune system response. Although spanning multiple genes, the enhancers within an EJ tend to collaborate with each other on regulating a single gene. The employment of homotypic transcription factor binding sites (TFBSs) in EJ enhancers and heterotypic TFBSs between constituent enhancers within an EJ may safeguard a robust transcriptional output of the target gene. EJ enhancers evolve under a weaker selective pressure compared to regular enhancers (REs), and approximately 35% of EJs do not have orthologues in the mouse genome. In GM12878, these human-specific EJs appear to regulate genes associated with the adaptive immune system response, while the conserved EJs are associated with innate immunity. Recently acquired human EJs are associated with the higher level of target gene expression compared with conserved EJs, thus facilitating the environmental adaptation of the organism during evolution. In short, the existence of EJs is a common regulatory architecture conferring a robust regulatory control for key lineage genes.


Subject(s)
Enhancer Elements, Genetic , Gene Expression Regulation , Genome, Human , Lymphocyte Activation/genetics , B-Lymphocytes/immunology , Humans , Organ Specificity
9.
Bioinformatics ; 34(2): 289-291, 2018 Jan 15.
Article in English | MEDLINE | ID: mdl-28968739

ABSTRACT

SUMMARY: Addressing deleterious effects of noncoding mutations is an essential step towards the identification of disease-causal mutations of gene regulatory elements. Several methods for quantifying the deleteriousness of noncoding mutations using artificial intelligence, deep learning and other approaches have been recently proposed. Although the majority of the proposed methods have demonstrated excellent accuracy on different test sets, there is rarely a consensus. In addition, advanced statistical and artificial learning approaches used by these methods make it difficult porting these methods outside of the labs that have developed them. To address these challenges and to transform the methodological advances in predicting deleterious noncoding mutations into a practical resource available for the broader functional genomics and population genetics communities, we developed SNPDelScore, which uses a panel of proposed methods for quantifying deleterious effects of noncoding mutations to precompute and compare the deleteriousness scores of all common SNPs in the human genome in 44 cell lines. The panel of deleteriousness scores of a SNP computed using different methods is supplemented by functional information from the GWAS Catalog, libraries of transcription factor-binding sites, and genic characteristics of mutations. SNPDelScore comes with a genome browser capable of displaying and comparing large sets of SNPs in a genomic locus and rapidly identifying consensus SNPs with the highest deleteriousness scores making those prime candidates for phenotype-causal polymorphisms. AVAILABILITY AND IMPLEMENTATION: https://www.ncbi.nlm.nih.gov/research/snpdelscore/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

10.
Nucleic Acids Res ; 45(5): 2307-2317, 2017 03 17.
Article in English | MEDLINE | ID: mdl-27980060

ABSTRACT

The majority of genome-wide association study (GWAS) risk variants reside in non-coding DNA sequences. Understanding how these sequence modifications lead to transcriptional alterations and cell-to-cell variability can help unraveling genotype-phenotype relationships. Here, we describe a computational method, dubbed CAPE, which calculates the likelihood of a genetic variant deactivating enhancers by disrupting the binding of transcription factors (TFs) in a given cellular context. CAPE learns sequence signatures associated with putative enhancers originating from large-scale sequencing experiments (such as ChIP-seq or DNase-seq) and models the change in enhancer signature upon a single nucleotide substitution. CAPE accurately identifies causative cis-regulatory variation including expression quantitative trait loci (eQTLs) and DNase I sensitivity quantitative trait loci (dsQTLs) in a tissue-specific manner with precision superior to several currently available methods. The presented method can be trained on any tissue-specific dataset of enhancers and known functional variants and applied to prioritize disease-associated variants in the corresponding tissue.


Subject(s)
Enhancer Elements, Genetic , Genetic Association Studies , Genome, Human , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Transcription Factors/metabolism , B-Lymphocytes/cytology , B-Lymphocytes/metabolism , Base Sequence , Deoxyribonuclease I/metabolism , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Humans , Likelihood Functions , Machine Learning , Organ Specificity , Protein Binding , Transcription Factors/genetics , Transcription, Genetic
11.
BMC Bioinformatics ; 19(1): 316, 2018 Sep 10.
Article in English | MEDLINE | ID: mdl-30200877

ABSTRACT

BACKGROUND: Transcription factor binding site (TFBS) loss, gain, and reshuffling within the sequence of a regulatory element could alter the function of that regulatory element. Some of the changes will be detrimental to the fitness of the species and will result in gradual removal from a population, while other changes would be either beneficial or just a part of genetic drift and end up being fixed in a population. This "reprogramming" of regulatory elements results in modification of the gene regulatory landscape during evolution. RESULTS: We identified reprogrammed enhancers (RPEs) by comparing the distribution of tissue-specific enhancers in the human and mouse genomes. We observed that around 30% of mammalian enhancers have been reprogrammed after the human-mouse speciation. In 79% of cases, the reprogramming of an enhancer resulted in a quantifiably different expression of a flanking gene. In the case of the Thy-1 cell surface antigen gene, for example, enhancer reprogramming is associated with cortex to thymus change in gene expression. To understand the mechanisms of enhancer reprogramming, we profiled the evolutionary changes in the TFBS enhancer content and found that enhancer reprogramming took place through the acquisition of new TFBSs in 72% of reprogramming events. CONCLUSIONS: Our results suggest that enhancer reprogramming takes place within well-established regulatory loci with RPEs contributing additively to fine-tuning of the gene regulatory program in mammals. We also found evidence for acquisition of novel gene function through enhancer reprogramming, which allows expansion of gene regulatory landscapes into new regulatory domains.


Subject(s)
Enhancer Elements, Genetic , Gene Expression Regulation , Genome , Animals , Humans , Mice , Phenotype
12.
BMC Genomics ; 19(1): 947, 2018 Dec 18.
Article in English | MEDLINE | ID: mdl-30563465

ABSTRACT

BACKGROUND: The regulatory landscape of a gene locus often consists of several functionally redundant enhancers establishing phenotypic robustness and evolutionary stability of its regulatory program. However, it is unclear what mechanisms are employed by redundant enhancers to cooperatively orchestrate gene expression. RESULTS: By comparing redundant enhancers to single enhancers (enhancers present in a single copy in a gene locus), we observed that the DNA sequence encryption differs between these two classes of enhancers, suggesting a difference in their regulatory mechanisms. Initiator enhancers, which are a subset of redundant enhancers and show similar sequence encryption to single enhancers, differ from the rest of redundant enhancers in their sequence encryption, evolutionary conservation and proximity to target genes. Genes hosting initiator enhancers in their loci feature elevated levels of expression. Initiator enhancers show a high level of 3D chromatin contacts with both transcription start sites and regular enhancers, suggesting their roles as primary activators and intermediate catalysts of gene expression, through which the regulatory signals of redundant enhancers are propagated to the target genes. In addition, GWAS and eQTLs variants are significantly enriched in initiator enhancers compared to redundant enhancers, suggesting a key functional role these sequences play in gene regulation. CONCLUSIONS: The specific characteristics and widespread abundance of initiator enhancers advocate for a possible universal hierarchical mechanism of tissue-specific gene regulation involving multiple redundant enhancers acting through initiator enhancers.


Subject(s)
Enhancer Elements, Genetic , Gene Expression Regulation , Neoplasms/classification , Neoplasms/genetics , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Biomarkers/analysis , Cells, Cultured , Chromatin , Genome-Wide Association Study , Humans
13.
BMC Genomics ; 18(1): 236, 2017 03 16.
Article in English | MEDLINE | ID: mdl-28302063

ABSTRACT

BACKGROUND: To understand the changes of gene regulation in carcinogenesis, we explored signals of DNA methylation - a stable epigenetic mark of gene regulatory elements - and designed a computational model to profile loss and gain of regulatory elements (REs) during carcinogenesis. We also utilized sequencing data to analyze the allele frequency of single nucleotide polymorphisms (SNPs) and detected the cancer-associated SNPs, i.e., the SNPs displaying the significant allele frequency difference between cancer and normal samples. RESULTS: After applying this model to chronic lymphocytic leukemia (CLL) data, we identified REs differentially activated (dREs) between normal and CLL cells, consisting of 6,802 dREs gained and 4,606 dREs lost in CLL. The identified regulatory perturbations coincide with changes in the expression of target genes. In particular, the genes encoding DNA methyltransferases harbor multiple lost-in-cancer dREs and zero gained-in-cancer dREs, indicating that the damaged regulation of these genes might be one of the key causes of tumor formation. dREs display a significantly elevated density of the genome-wide association study (GWAS) SNPs associated with CLL and CLL-related traits. We observed that most of dRE GWAS SNPs associated with CLL and CLL-related traits (83%) display a significant haplotype association among the identified cancer-associated alleles and the risk alleles that have been reported in GWAS. Also dREs are enriched for the binding sites of the well-established B-cell and CLL transcription factors (TFs) NF-kB, AP2, P53, E2F1, PAX5, and SP1. We also identified CLL-associated SNPs and demonstrated that the mutations at these SNPs change the binding sites of key TFs much more frequently than expected. CONCLUSIONS: Through exploring sequencing data measuring DNA methylation, we identified the epigenetic alterations (more specifically, DNA methylation) and genetic mutations along non-coding genomic regions CLL, and demonstrated that these changes play a critical role in carcinogenesis through damaging the regulation of key genes and alternating the binding of key TFs in B and CLL cells.


Subject(s)
Epigenesis, Genetic , Gene Expression Regulation, Leukemic , Genetic Variation , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Alleles , Animals , Binding Sites , Chromosome Mapping , Cluster Analysis , Computational Biology/methods , CpG Islands , DNA Methylation , Evolution, Molecular , Gene Expression Profiling , Genome , Genome-Wide Association Study , Genomics/methods , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Mice , Molecular Sequence Annotation , Polymorphism, Single Nucleotide , Protein Binding , Transcription Factors/metabolism , Transcriptome
14.
Development ; 141(4): 878-88, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24496624

ABSTRACT

The Drosophila heart is composed of two distinct cell types, the contractile cardial cells (CCs) and the surrounding non-muscle pericardial cells (PCs), development of which is regulated by a network of conserved signaling molecules and transcription factors (TFs). Here, we used machine learning with array-based chromatin immunoprecipitation (ChIP) data and TF sequence motifs to computationally classify cell type-specific cardiac enhancers. Extensive testing of predicted enhancers at single-cell resolution revealed the added value of ChIP data for modeling cell type-specific activities. Furthermore, clustering the top-scoring classifier sequence features identified novel cardiac and cell type-specific regulatory motifs. For example, we found that the Myb motif learned by the classifier is crucial for CC activity, and the Myb TF acts in concert with two forkhead domain TFs and Polo kinase to regulate cardiac progenitor cell divisions. In addition, differential motif enrichment and cis-trans genetic studies revealed that the Notch signaling pathway TF Suppressor of Hairless [Su(H)] discriminates PC from CC enhancer activities. Collectively, these studies elucidate molecular pathways used in the regulatory decisions for proliferation and differentiation of cardiac progenitor cells, implicate Su(H) in regulating cell fate decisions of these progenitors, and document the utility of enhancer modeling in uncovering developmental regulatory subnetworks.


Subject(s)
Cell Differentiation/physiology , Cell Division/physiology , Drosophila/growth & development , Enhancer Elements, Genetic/genetics , Gene Expression Regulation, Developmental/physiology , Heart/growth & development , Stem Cells/physiology , Animals , Artificial Intelligence , Chromatin Immunoprecipitation , Classification/methods , Drosophila/cytology , Gene Expression Regulation, Developmental/genetics , Mutagenesis , Myoblasts, Cardiac/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
15.
Nucleic Acids Res ; 43(1): 225-36, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25520196

ABSTRACT

Thousands of non-coding SNPs have been linked to human diseases in the past. The identification of causal alleles within this pool of disease-associated non-coding SNPs is largely impossible due to the inability to accurately quantify the impact of non-coding variation. To overcome this challenge, we developed a computational model that uses ChIP-seq intensity variation in response to non-coding allelic change as a proxy to the quantification of the biological role of non-coding SNPs. We applied this model to HepG2 enhancers and detected 4796 enhancer SNPs capable of disrupting enhancer activity upon allelic change. These SNPs are significantly over-represented in the binding sites of HNF4 and FOXA families of liver transcription factors and liver eQTLs. In addition, these SNPs are strongly associated with liver GWAS traits, including type I diabetes, and are linked to the abnormal levels of HDL and LDL cholesterol. Our model is directly applicable to any enhancer set for mapping causal regulatory SNPs.


Subject(s)
Enhancer Elements, Genetic , Polymorphism, Single Nucleotide , Alleles , Binding Sites , Cell Line , Cell Line, Tumor , Chromatin Immunoprecipitation , Genome-Wide Association Study , Humans , Liver/metabolism , Quantitative Trait Loci , Sequence Analysis, DNA , Transcription Factors/metabolism
16.
Nucleic Acids Res ; 43(3): 1726-39, 2015 Feb 18.
Article in English | MEDLINE | ID: mdl-25609699

ABSTRACT

Here we used discriminative training methods to uncover the chromatin, transcription factor (TF) binding and sequence features of enhancers underlying gene expression in individual cardiac cells. We used machine learning with TF motifs and ChIP data for a core set of cardiogenic TFs and histone modifications to classify Drosophila cell-type-specific cardiac enhancer activity. We show that the classifier models can be used to predict cardiac cell subtype cis-regulatory activities. Associating the predicted enhancers with an expression atlas of cardiac genes further uncovered clusters of genes with transcription and function limited to individual cardiac cell subtypes. Further, the cell-specific enhancer models revealed chromatin, TF binding and sequence features that distinguish enhancer activities in distinct subsets of heart cells. Collectively, our results show that computational modeling combined with empirical testing provides a powerful platform to uncover the enhancers, TF motifs and gene expression profiles which characterize individual cardiac cell fates.


Subject(s)
Drosophila/genetics , Enhancer Elements, Genetic , Myocardium/metabolism , Transcription, Genetic , Animals , Animals, Genetically Modified , Drosophila/cytology , Gene Expression Regulation , Myocardium/cytology
17.
Mol Biol Evol ; 32(8): 2161-80, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25976354

ABSTRACT

To explore the underlying mechanisms whereby noncoding variants affect transcriptional regulation, we identified nucleotides capable of disrupting binding of transcription factors and deactivating enhancers if mutated (dubbed candidate killer mutations or KMs) in HepG2 enhancers. On average, approximately 11% of enhancer positions are prone to KMs. A comparable number of enhancer positions are capable of creating de novo binding sites via a single-nucleotide mutation (dubbed candidate restoration mutations or RSs). Both KM and RS positions are evolutionarily conserved and tend to form clusters within an enhancer. We observed that KMs have the most deleterious effect on enhancer activity. In contrast, RSs have a smaller effect in increasing enhancer activity. Additionally, the KMs are strongly associated with liver-related Genome Wide Association Study traits compared with other HepG2 enhancer regions. By applying our framework to lymphoblastoid cell lines, we found that KMs underlie differential binding of transcription factors and differential local chromatin accessibility. The gene expression quantitative trait loci associated with the tissue-specific genes are strongly enriched in KM positions. In summary, we conclude that the KMs have the greatest impact on the level of gene expression and are likely to be the causal variants of tissue-specific gene expression and disease predisposition.


Subject(s)
Gene Expression Regulation , Genetic Predisposition to Disease , Mutation , Response Elements , Genome-Wide Association Study , Hep G2 Cells , Humans , Organ Specificity/genetics , Transcription Factors/genetics , Transcription Factors/metabolism
18.
Genome Res ; 22(11): 2278-89, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22759862

ABSTRACT

Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.


Subject(s)
Enhancer Elements, Genetic , Rhombencephalon/metabolism , Sequence Analysis, DNA/methods , Transcription, Genetic , Algorithms , Animals , Chromatin/metabolism , Gene Expression Regulation, Developmental , Genome, Human , Homeodomain Proteins/genetics , Homeodomain Proteins/metabolism , Humans , POU Domain Factors/genetics , POU Domain Factors/metabolism , Rhombencephalon/growth & development , Transcription Factors/genetics , Transcription Factors/metabolism , Zebrafish
19.
PLoS Genet ; 8(3): e1002531, 2012.
Article in English | MEDLINE | ID: mdl-22412381

ABSTRACT

Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA-based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type-specific developmental gene expression patterns.


Subject(s)
Artificial Intelligence , Binding Sites , Drosophila melanogaster , Enhancer Elements, Genetic , Transcription Factors/genetics , Animals , Cell Lineage , Drosophila melanogaster/cytology , Drosophila melanogaster/genetics , Drosophila melanogaster/growth & development , Evolution, Molecular , Gene Expression Regulation, Developmental , Mesoderm/cytology , Mesoderm/growth & development , Muscles/cytology , Phylogeny , Transcription, Genetic
20.
Proc Natl Acad Sci U S A ; 109(50): 20768-73, 2012 Dec 11.
Article in English | MEDLINE | ID: mdl-23184988

ABSTRACT

Contemporary high-throughput technologies permit the rapid identification of transcription factor (TF) target genes on a genome-wide scale, yet the functional significance of TFs requires knowledge of target gene expression patterns, cooperating TFs, and cis-regulatory element (CRE) structures. Here we investigated the myogenic regulatory network downstream of the Drosophila zinc finger TF Lame duck (Lmd) by combining both previously published and newly performed genomic data sets, including ChIP sequencing (ChIP-seq), genome-wide mRNA profiling, cell-specific expression patterns of putative transcriptional targets, analysis of histone mark signatures, studies of TF cooccupancy by additional mesodermal regulators, TF binding site determination using protein binding microarrays (PBMs), and machine learning of candidate CRE motif compositions. Our findings suggest that Lmd orchestrates an extensive myogenic regulatory network, a conclusion supported by the identification of Lmd-dependent genes, histone signatures of Lmd-bound genomic regions, and the relationship of these features to cell-specific gene expression patterns. The heterogeneous cooccupancy of Lmd-bound regions with additional mesodermal regulators revealed that different transcriptional inputs are used to mediate similar myogenic gene expression patterns. Machine learning further demonstrated diverse combinatorial motif patterns within tissue-specific Lmd-bound regions. PBM analysis established the complete spectrum of Lmd DNA binding specificities, and site-directed mutagenesis of Lmd and additional newly discovered motifs in known enhancers demonstrated the critical role of these TF binding sites in supporting full enhancer activity. Collectively, these findings provide insights into the transcriptional codes regulating muscle gene expression and offer a generalizable approach for similar studies in other systems.


Subject(s)
Drosophila Proteins/genetics , Drosophila melanogaster/growth & development , Drosophila melanogaster/genetics , Gene Regulatory Networks , Genome, Insect , Muscle Development/genetics , Myogenic Regulatory Factors/genetics , Animals , Animals, Genetically Modified , Artificial Intelligence , Base Sequence , Binding Sites/genetics , DNA/genetics , DNA/metabolism , Drosophila Proteins/metabolism , Drosophila melanogaster/cytology , Drosophila melanogaster/metabolism , Enhancer Elements, Genetic , Gene Expression Regulation, Developmental , Mesoderm/cytology , Mesoderm/growth & development , Mesoderm/metabolism , Molecular Sequence Data , Myoblasts/cytology , Myoblasts/metabolism , Myogenic Regulatory Factors/metabolism , Systems Biology , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL