Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 81
Filter
1.
Bioinformatics ; 37(18): 2834-2840, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33760053

ABSTRACT

MOTIVATION: Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences-for example, the binding site motifs of DNA- and RNA-binding proteins. RESULTS: The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME's capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely used MEME Suite of sequence analysis tools. The name STREME stands for 'Simple, Thorough, Rapid, Enriched Motif Elicitation'. AVAILABILITY AND IMPLEMENTATION: The STREME web server and source code are provided freely for non-commercial use at http://meme-suite.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Chromatin Immunoprecipitation Sequencing , Binding Sites , Sequence Analysis, DNA , DNA , Nucleotide Motifs
2.
Bioinformatics ; 36(12): 3902-3904, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32246829

ABSTRACT

MOTIVATION: Identifying the genes regulated by a given transcription factor (TF) (its 'target genes') is a key step in developing a comprehensive understanding of gene regulation. Previously, we developed a method (CisMapper) for predicting the target genes of a TF based solely on the correlation between a histone modification at the TF's binding site and the expression of the gene across a set of tissues or cell lines. That approach is limited to organisms for which extensive histone and expression data are available, and does not explicitly incorporate the genomic distance between the TF and the gene. RESULTS: We present the T-Gene algorithm, which overcomes these limitations. It can be used to predict which genes are most likely to be regulated by a TF, and which of the TF's binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene's promoter, achieving median precision above 60%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median precision above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions. AVAILABILITY AND IMPLEMENTATION: The T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://meme-suite.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Binding Sites , Chromatin Immunoprecipitation , Gene Expression Regulation , Transcription Factors/genetics , Transcription Factors/metabolism
3.
Mol Cell ; 50(5): 613-23, 2013 Jun 06.
Article in English | MEDLINE | ID: mdl-23746349

ABSTRACT

Motifs rich in arginines and glycines were recognized several decades ago to play functional roles and were termed glycine-arginine-rich (GAR) domains and/or RGG boxes. We review here the evolving functions of the RGG box along with several sequence variations that we collectively term the RGG/RG motif. Greater than 1,000 human proteins harbor the RGG/RG motif, and these proteins influence numerous physiological processes such as transcription, pre-mRNA splicing, DNA damage signaling, mRNA translation, and the regulation of apoptosis. In particular, we discuss the role of the RGG/RG motif in mediating nucleic acid and protein interactions, a function that is often regulated by arginine methylation and partner-binding proteins. The physiological relevance of the RGG/RG motif is highlighted by its association with several diseases including neurological and neuromuscular diseases and cancer. Herein, we discuss the evidence for the emerging diverse functionality of this important motif.


Subject(s)
Protein Interaction Domains and Motifs , Proteins/chemistry , Proteins/metabolism , Alternative Splicing , Amino Acid Motifs , Amino Acid Sequence , Amyotrophic Lateral Sclerosis/metabolism , Apoptosis/physiology , Arginine/metabolism , DNA Damage , Fragile X Syndrome/metabolism , Humans , Methylation , Molecular Sequence Data , Neoplasms/metabolism , Neuromuscular Diseases/metabolism , Protein Biosynthesis
4.
Bioinformatics ; 35(16): 2774-2782, 2019 08 15.
Article in English | MEDLINE | ID: mdl-30596994

ABSTRACT

MOTIVATION: Post-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called 'motifs' that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation. RESULTS: We describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate P-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo re-implements the two most widely used PTM motif discovery algorithms-motif-x and MoDL-while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing 'background' peptides from an unshuffled proteome database. Our results thus suggest that many of the papers that use motif-x to find motifs may be reporting results that lack statistical support. AVAILABILITY AND IMPLEMENTATION: The MoMo web server and source code are provided at http://meme-suite.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Protein Processing, Post-Translational , Software , Algorithms , Amino Acid Motifs , Proteome , Tandem Mass Spectrometry
5.
Nucleic Acids Res ; 46(21): 11381-11395, 2018 11 30.
Article in English | MEDLINE | ID: mdl-30335167

ABSTRACT

During embryogenesis, vascular development relies on a handful of transcription factors that instruct cell fate in a distinct sub-population of the endothelium (1). The SOXF proteins that comprise SOX7, 17 and 18, are molecular switches modulating arterio-venous and lymphatic endothelial differentiation (2,3). Here, we show that, in the SOX-F family, only SOX18 has the ability to switch between a monomeric and a dimeric form. We characterized the SOX18 dimer in binding assays in vitro, and using a split-GFP reporter assay in a zebrafish model system in vivo. We show that SOX18 dimerization is driven by a novel motif located in the vicinity of the C-terminus of the DNA binding region. Insertion of this motif in a SOX7 monomer forced its assembly into a dimer. Genome-wide analysis of SOX18 binding locations on the chromatin revealed enrichment for a SOX dimer binding motif, correlating with genes with a strong endothelial signature. Using a SOX18 small molecule inhibitor that disrupts dimerization, we revealed that dimerization is important for transcription. Overall, we show that dimerization is a specific feature of SOX18 that enables the recruitment of key endothelial transcription factors, and refines the selectivity of the binding to discrete genomic locations assigned to endothelial specific genes.


Subject(s)
SOXF Transcription Factors/chemistry , Amino Acid Motifs , Animals , Biosensing Techniques , DNA-Binding Proteins/chemistry , Endothelial Cells/metabolism , Endothelium/metabolism , Gene Expression Regulation, Developmental , Green Fluorescent Proteins/chemistry , Humans , Mice , Mutation , Open Reading Frames , Protein Domains , Protein Multimerization , Zebrafish , Zebrafish Proteins/chemistry
6.
Genes Dev ; 26(24): 2802-16, 2012 Dec 15.
Article in English | MEDLINE | ID: mdl-23249739

ABSTRACT

In the vertebrate neural tube, regional Sonic hedgehog (Shh) signaling invokes a time- and concentration-dependent induction of six different cell populations mediated through Gli transcriptional regulators. Elsewhere in the embryo, Shh/Gli responses invoke different tissue-appropriate regulatory programs. A genome-scale analysis of DNA binding by Gli1 and Sox2, a pan-neural determinant, identified a set of shared regulatory regions associated with key factors central to cell fate determination and neural tube patterning. Functional analysis in transgenic mice validates core enhancers for each of these factors and demonstrates the dual requirement for Gli1 and Sox2 inputs for neural enhancer activity. Furthermore, through an unbiased determination of Gli-binding site preferences and analysis of binding site variants in the developing mammalian CNS, we demonstrate that differential Gli-binding affinity underlies threshold-level activator responses to Shh input. In summary, our results highlight Sox2 input as a context-specific determinant of the neural-specific Shh response and differential Gli-binding site affinity as an important cis-regulatory property critical for interpreting Shh morphogen action in the mammalian neural tube.


Subject(s)
Body Patterning/physiology , Hedgehog Proteins/metabolism , Kruppel-Like Transcription Factors/metabolism , SOXB1 Transcription Factors/metabolism , Animals , Body Patterning/genetics , Mice , Mice, Transgenic , Neural Tube/embryology , Neural Tube/metabolism , Protein Binding , Zinc Finger Protein GLI1
7.
Nucleic Acids Res ; 45(4): e19, 2017 02 28.
Article in English | MEDLINE | ID: mdl-28204599

ABSTRACT

Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Evidence from chromatin conformation capture experiments shows that this approach is inadequate due to long-distance regulation via chromatin looping. We present CisMapper, which predicts the regulatory targets of a TF using the correlation between a histone mark at the TF's bound sites and the expression of each gene across a panel of tissues. Using both chromatin conformation capture and differential expression data, we show that CisMapper is more accurate at predicting the target genes of a TF than the distance-based approaches currently used, and is particularly advantageous for predicting the long-range regulatory interactions typical of tissue-specific gene expression. CisMapper also predicts which TF binding sites regulate a given gene more accurately than using genomic distance. Unlike distance-based methods, CisMapper can predict which transcription start site of a gene is regulated by a particular binding site of the TF.


Subject(s)
Chromatin Immunoprecipitation/methods , Regulatory Elements, Transcriptional , Sequence Analysis, DNA/methods , Software , Transcription Factors/metabolism , Algorithms , Binding Sites , Histone Code , Promoter Regions, Genetic , Transcription Initiation Site
8.
Nucleic Acids Res ; 45(11): 6572-6588, 2017 Jun 20.
Article in English | MEDLINE | ID: mdl-28541545

ABSTRACT

Krüppel-like factors (KLFs) are a family of 17 transcription factors characterized by a conserved DNA-binding domain of three zinc fingers and a variable N-terminal domain responsible for recruiting cofactors. KLFs have diverse functions in stem cell biology, embryo patterning, and tissue homoeostasis. KLF1 and related family members function as transcriptional activators via recruitment of co-activators such as EP300, whereas KLF3 and related members act as transcriptional repressors via recruitment of C-terminal Binding Proteins. KLF1 directly activates the Klf3 gene via an erythroid-specific promoter. Herein, we show KLF1 and KLF3 bind common as well as unique sites within the erythroid cell genome by ChIP-seq. We show KLF3 can displace KLF1 from key erythroid gene promoters and enhancers in vivo. Using 4sU RNA labelling and RNA-seq, we show this competition results in reciprocal transcriptional outputs for >50 important genes. Furthermore, Klf3-/- mice displayed exaggerated recovery from anemic stress and persistent cell cycling consistent with a role for KLF3 in dampening KLF1-driven proliferation. We suggest this study provides a paradigm for how KLFs work in incoherent feed-forward loops or networks to fine-tune transcription and thereby control diverse biological processes such as cell proliferation.


Subject(s)
Enhancer Elements, Genetic , Kruppel-Like Transcription Factors/metabolism , Promoter Regions, Genetic , Transcriptional Activation , Animals , Cell Line , Coculture Techniques , Erythroid Cells/metabolism , Erythropoiesis , Mice , Transcription, Genetic
9.
Development ; 142(21): 3746-57, 2015 Nov 01.
Article in English | MEDLINE | ID: mdl-26534986

ABSTRACT

Transcription factors act during cortical development as master regulatory genes that specify cortical arealization and cellular identities. Although numerous transcription factors have been identified as being crucial for cortical development, little is known about their downstream targets and how they mediate the emergence of specific neuronal connections via selective axon guidance. The EMX transcription factors are essential for early patterning of the cerebral cortex, but whether EMX1 mediates interhemispheric connectivity by controlling corpus callosum formation remains unclear. Here, we demonstrate that in mice on the C57Bl/6 background EMX1 plays an essential role in the midline crossing of an axonal subpopulation of the corpus callosum derived from the anterior cingulate cortex. In the absence of EMX1, cingulate axons display reduced expression of the axon guidance receptor NRP1 and form aberrant axonal bundles within the rostral corpus callosum. EMX1 also functions as a transcriptional activator of Nrp1 expression in vitro, and overexpression of this protein in Emx1 knockout mice rescues the midline-crossing phenotype. These findings reveal a novel role for the EMX1 transcription factor in establishing cortical connectivity by regulating the interhemispheric wiring of a subpopulation of neurons within the mouse anterior cingulate cortex.


Subject(s)
Gyrus Cinguli/metabolism , Homeodomain Proteins/metabolism , Neuropilin-1/metabolism , Transcription Factors/metabolism , Agenesis of Corpus Callosum/embryology , Agenesis of Corpus Callosum/genetics , Animals , Axons/metabolism , Mice, Inbred C57BL , Mice, Knockout , Semaphorins/metabolism
10.
Genome Res ; 24(6): 999-1011, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24501021

ABSTRACT

Our current understanding of how DNA is packed in the nucleus is most accurate at the fine scale of individual nucleosomes and at the large scale of chromosome territories. However, accurate modeling of DNA architecture at the intermediate scale of ∼50 kb-10 Mb is crucial for identifying functional interactions among regulatory elements and their target promoters. We describe a method, Fit-Hi-C, that assigns statistical confidence estimates to mid-range intra-chromosomal contacts by jointly modeling the random polymer looping effect and previously observed technical biases in Hi-C data sets. We demonstrate that our proposed approach computes accurate empirical null models of contact probability without any distribution assumption, corrects for binning artifacts, and provides improved statistical power relative to a previously described method. High-confidence contacts identified by Fit-Hi-C preferentially link expressed gene promoters to active enhancers identified by chromatin signatures in human embryonic stem cells (ESCs), capture 77% of RNA polymerase II-mediated enhancer-promoter interactions identified using ChIA-PET in mouse ESCs, and confirm previously validated, cell line-specific interactions in mouse cortex cells. We observe that insulators and heterochromatin regions are hubs for high-confidence contacts, while promoters and strong enhancers are involved in fewer contacts. We also observe that binding peaks of master pluripotency factors such as NANOG and POU5F1 are highly enriched in high-confidence contacts for human ESCs. Furthermore, we show that pairs of loci linked by high-confidence contacts exhibit similar replication timing in human and mouse ESCs and preferentially lie within the boundaries of topological domains for human and mouse cell lines.


Subject(s)
Chromatin Assembly and Disassembly , Chromatin/genetics , Models, Genetic , Regulatory Sequences, Nucleic Acid , Animals , Chromatin/chemistry , Confidence Intervals , Embryonic Stem Cells/metabolism , Histone Code , Homeodomain Proteins/genetics , Homeodomain Proteins/metabolism , Humans , Mice , Nanog Homeobox Protein , Neurons/metabolism , Octamer Transcription Factor-3/genetics , Octamer Transcription Factor-3/metabolism , Protein Binding , Species Specificity , Yeasts/genetics
11.
Development ; 141(11): 2195-205, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24866114

ABSTRACT

Mammalian sex determination hinges on the development of ovaries or testes, with testis fate being triggered by the expression of the transcription factor sex-determining region Y (Sry). Reduced or delayed Sry expression impairs testis development, highlighting the importance of its accurate spatiotemporal regulation and implying a potential role for SRY dysregulation in human intersex disorders. Several epigenetic modifiers, transcription factors and kinases are implicated in regulating Sry transcription, but it remains unclear whether or how this farrago of factors acts co-ordinately. Here we review our current understanding of Sry regulation and provide a model that assembles all known regulators into three modules, each converging on a single transcription factor that binds to the Sry promoter. We also discuss potential future avenues for discovering the cis-elements and trans-factors required for Sry regulation.


Subject(s)
Gene Expression Regulation, Developmental , Ovary/embryology , Sex-Determining Region Y Protein/physiology , Testis/embryology , Animals , Cell Lineage , Epigenesis, Genetic , Female , GATA4 Transcription Factor/metabolism , Humans , Male , Mice , Promoter Regions, Genetic , Steroidogenic Factor 1/metabolism , Transcription, Genetic , WT1 Proteins/metabolism , Y Chromosome
12.
Bioinformatics ; 32(8): 1217-9, 2016 04 15.
Article in English | MEDLINE | ID: mdl-26704599

ABSTRACT

UNLABELLED: Precise regulatory control of genes, particularly in eukaryotes, frequently requires the joint action of multiple sequence-specific transcription factors. A cis-regulatory module (CRM) is a genomic locus that is responsible for gene regulation and that contains multiple transcription factor binding sites in close proximity. Given a collection of known transcription factor binding motifs, many bioinformatics methods have been proposed over the past 15 years for identifying within a genomic sequence candidate CRMs consisting of clusters of those motifs. RESULTS: The MCAST algorithm uses a hidden Markov model with a P-value-based scoring scheme to identify candidate CRMs. Here, we introduce a new version of MCAST that offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs. AVAILABILITY AND IMPLEMENTATION: MCAST is part of the MEME Suite software toolkit. A web server and source code are available at http://meme-suite.org and http://alternate.meme-suite.org CONTACT: t.bailey@imb.uq.edu.au or william-noble@uw.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Binding Sites , Sequence Analysis, DNA , Genome , Humans , Regulatory Elements, Transcriptional , Software , Transcription Factors
13.
Nucleic Acids Res ; 43(W1): W39-49, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-25953851

ABSTRACT

The MEME Suite is a powerful, integrated set of web-based tools for studying sequence motifs in proteins, DNA and RNA. Such motifs encode many biological functions, and their detection and characterization is important in the study of molecular interactions in the cell, including the regulation of gene expression. Since the previous description of the MEME Suite in the 2009 Nucleic Acids Research Web Server Issue, we have added six new tools. Here we describe the capabilities of all the tools within the suite, give advice on their best use and provide several case studies to illustrate how to combine the results of various MEME Suite tools for successful motif-based analyses. The MEME Suite is freely available for academic use at http://meme-suite.org, and source code is also available for download and local installation.


Subject(s)
Amino Acid Motifs , Nucleotide Motifs , Software , DNA/chemistry , Internet , Plasmodium falciparum , Protein Interaction Domains and Motifs , Protein Sorting Signals , Protozoan Proteins/chemistry , Receptors, Calcitriol/chemistry , Sequence Analysis, DNA , Sequence Analysis, Protein , Sequence Analysis, RNA
14.
Cereb Cortex ; 25(10): 3758-78, 2015 Oct.
Article in English | MEDLINE | ID: mdl-25331604

ABSTRACT

Transcription factors of the nuclear factor one (NFI) family play a pivotal role in the development of the nervous system. One member, NFIX, regulates the development of the neocortex, hippocampus, and cerebellum. Postnatal Nfix(-/-) mice also display abnormalities within the subventricular zone (SVZ) lining the lateral ventricles, a region of the brain comprising a neurogenic niche that provides ongoing neurogenesis throughout life. Specifically, Nfix(-/-) mice exhibit more PAX6-expressing progenitor cells within the SVZ. However, the mechanism underlying the development of this phenotype remains undefined. Here, we reveal that NFIX contributes to multiple facets of SVZ development. Postnatal Nfix(-/-) mice exhibit increased levels of proliferation within the SVZ, both in vivo and in vitro as assessed by a neurosphere assay. Furthermore, we show that the migration of SVZ-derived neuroblasts to the olfactory bulb is impaired, and that the olfactory bulbs of postnatal Nfix(-/-) mice are smaller. We also demonstrate that gliogenesis within the rostral migratory stream is delayed in the absence of Nfix, and reveal that Gdnf (glial-derived neurotrophic factor), a known attractant for SVZ-derived neuroblasts, is a target for transcriptional activation by NFIX. Collectively, these findings suggest that NFIX regulates both proliferation and migration during the development of the SVZ neurogenic niche.


Subject(s)
Cell Movement , Cell Proliferation , Lateral Ventricles/embryology , NFI Transcription Factors/physiology , Neural Stem Cells/physiology , Neurogenesis , Animals , Female , Glial Cell Line-Derived Neurotrophic Factor/metabolism , Interneurons/physiology , Lateral Ventricles/metabolism , Male , Mice , Mice, Inbred C57BL , Mice, Knockout , NFI Transcription Factors/genetics , NFI Transcription Factors/metabolism , Neuroglia/physiology , Olfactory Bulb/embryology , Olfactory Bulb/metabolism , Stem Cell Niche
15.
Mol Cell Proteomics ; 13(5): 1330-40, 2014 May.
Article in English | MEDLINE | ID: mdl-24532840

ABSTRACT

Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA-protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency. The software and data used in this research are available at http://bioinf.scmb.uq.edu.au/proteinabundance/.


Subject(s)
Bayes Theorem , Cell Cycle Proteins/metabolism , RNA, Messenger/chemistry , RNA, Messenger/metabolism , Software , Transcriptome , Cell Cycle Proteins/genetics , Humans , Models, Molecular , Proteomics , RNA Folding , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Schizosaccharomyces pombe Proteins/genetics , Schizosaccharomyces pombe Proteins/metabolism
16.
Nucleic Acids Res ; 42(17): 11000-10, 2014.
Article in English | MEDLINE | ID: mdl-25200088

ABSTRACT

Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules-CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for 'other' tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a 'nearest neighbor' heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps.


Subject(s)
Gene Expression Regulation , Models, Genetic , Cell Line , Genomics/methods , Histones/analysis , Humans , Linear Models , Organ Specificity , Transcription Factors/metabolism , Transcription Initiation Site
17.
J Neurosci ; 34(8): 2921-30, 2014 Feb 19.
Article in English | MEDLINE | ID: mdl-24553933

ABSTRACT

Epigenetic mechanisms are essential in regulating neural progenitor cell self-renewal, with the chromatin-modifying protein Enhancer of zeste homolog 2 (EZH2) emerging as a central player in promoting progenitor cell self-renewal during cortical development. Despite this, how Ezh2 is itself regulated remains unclear. Here, we demonstrate that the transcription factor nuclear factor IB (NFIB) plays a key role in this process. Nfib(-/-) mice exhibit an increased number of proliferative ventricular zone cells that express progenitor cell markers and upregulation of EZH2 expression within the neocortex and hippocampus. NFIB binds to the Ezh2 promoter and overexpression of NFIB represses Ezh2 transcription. Finally, key downstream targets of EZH2-mediated epigenetic repression are misregulated in Nfib(-/-) mice. Collectively, these results suggest that the downregulation of Ezh2 transcription by NFIB is an important component of the process of neural progenitor cell differentiation during cortical development.


Subject(s)
Cerebral Cortex/growth & development , Epigenesis, Genetic/physiology , NFI Transcription Factors/genetics , NFI Transcription Factors/physiology , Polycomb Repressive Complex 2/genetics , Polycomb Repressive Complex 2/physiology , Animals , Cell Count , Cerebral Cortex/cytology , Cerebral Cortex/physiology , Electrophoretic Mobility Shift Assay , Enhancer of Zeste Homolog 2 Protein , Female , Hippocampus/cytology , Hippocampus/growth & development , Immunohistochemistry , Male , Mice , Mice, Knockout , Microarray Analysis , Mutation/genetics , Mutation/physiology , Neural Stem Cells/physiology , Primary Cell Culture , Promoter Regions, Genetic/genetics , Real-Time Polymerase Chain Reaction
18.
BMC Dev Biol ; 15: 34, 2015 Oct 06.
Article in English | MEDLINE | ID: mdl-26444262

ABSTRACT

BACKGROUND: Sex determination in mammals requires expression of the Y-linked gene Sry in the bipotential genital ridges of the XY embryo. Even minor delay of the onset of Sry expression can result in XY sex reversal, highlighting the need for accurate gene regulation during sex determination. However, the location of critical regulatory elements remains unknown. Here, we analysed Sry flanking sequences across many species, using newly available genome sequences and computational tools, to better understand Sry's genomic context and to identify conserved regions predictive of functional roles. METHODS: Flanking sequences from 17 species were analysed using both global and local sequence alignment methods. Multiple motif searches were employed to characterise common motifs in otherwise unconserved sequence. RESULTS: We identified position-specific conservation of binding motifs for multiple transcription factor families, including GATA binding factors and Oct/Sox dimers. In contrast with the landscape of extremely low sequence conservation around the Sry coding region, our analysis highlighted a strongly conserved interval of ~106 bp within the Sry promoter (which we term the Sry Proximal Conserved Interval, SPCI). We further report that inverted repeats flanking murine Sry are much larger than previously recognised. CONCLUSIONS: The unusually fast pace of sequence drift on the Y chromosome sharpens the likely functional significance of both the SPCI and the identified binding motifs, providing a basis for future studies of the role(s) of these elements in Sry regulation.


Subject(s)
Mammals/genetics , Sex-Determining Region Y Protein/genetics , Animals , Base Sequence , Conserved Sequence , Evolution, Molecular , Humans , Mammals/classification , Molecular Sequence Data , Sequence Alignment , Transcription Factors/metabolism
19.
Genome Res ; 22(7): 1372-81, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22550012

ABSTRACT

Double-stranded DNA is able to form triple-helical structures by accommodating a third nucleotide strand in its major groove. This sequence-specific process offers a potent mechanism for targeting genomic loci of interest that is of great value for biotechnological and gene-therapeutic applications. It is likely that nature has leveraged this addressing system for gene regulation, because computational studies have uncovered an abundance of putative triplex target sites in various genomes, with enrichment particularly in gene promoters. However, to draw a more complete picture of the in vivo role of triplexes, not only the putative targets but also the sequences acting as the third strand and their capability to pair with the predicted target sites need to be studied. Here we present Triplexator, the first computational framework that integrates all aspects of triplex formation, and showcase its potential by discussing research examples for which the different aspects of triplex formation are important. We find that chromatin-associated RNAs have a significantly higher fraction of sequence features able to form triplexes than expected at random, suggesting their involvement in gene regulation. We furthermore identify hundreds of human genes that contain sequence features in their promoter predicted to be able to form a triplex with a target within the same promoter, suggesting the involvement of triplexes in feedback-based gene regulation. With focus on biotechnological applications, we screen mammalian genomes for high-affinity triplex target sites that can be used to target genomic loci specifically and find that triplex formation offers a resolution of ~1300 nt.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Genomics/methods , Oligonucleotides/chemistry , RNA-Binding Proteins/chemistry , Animals , Chromatin/chemistry , Chromatin/genetics , Circular Dichroism , Computational Biology/methods , DNA/chemistry , DNA/genetics , Genetic Loci , Genome, Human , Humans , Hydrogen Bonding , Nucleic Acid Conformation , Oligonucleotides/genetics , Promoter Regions, Genetic , RNA Stability , RNA-Binding Proteins/genetics , Time Factors
20.
Genome Res ; 22(12): 2385-98, 2012 Dec.
Article in English | MEDLINE | ID: mdl-22835905

ABSTRACT

KLF1 (formerly known as EKLF) regulates the development of erythroid cells from bi-potent progenitor cells via the transcriptional activation of a diverse set of genes. Mice lacking Klf1 die in utero prior to E15 from severe anemia due to the inadequate expression of genes controlling hemoglobin production, cell membrane and cytoskeletal integrity, and the cell cycle. We have recently described the full repertoire of KLF1 binding sites in vivo by performing KLF1 ChIP-seq in primary erythroid tissue (E14.5 fetal liver). Here we describe the KLF1-dependent erythroid transcriptome by comparing mRNA-seq from Klf1(+/+) and Klf1(-/-) erythroid tissue. This has revealed novel target genes not previously obtainable by traditional microarray technology, and provided novel insights into the function of KLF1 as a transcriptional activator. We define a cis-regulatory module bound by KLF1, GATA1, TAL1, and EP300 that coordinates a core set of erythroid genes. We also describe a novel set of erythroid-specific promoters that drive high-level expression of otherwise ubiquitously expressed genes in erythroid cells. Our study has identified two novel lncRNAs that are dynamically expressed during erythroid differentiation, and discovered a role for KLF1 in directing apoptotic gene expression to drive the terminal stages of erythroid maturation.


Subject(s)
Erythropoiesis/genetics , Gene Expression Regulation, Developmental , Kruppel-Like Transcription Factors/genetics , RNA, Messenger/genetics , Transcriptome , Animals , Apoptosis , Basic Helix-Loop-Helix Transcription Factors/genetics , Basic Helix-Loop-Helix Transcription Factors/metabolism , Blotting, Western , Cell Differentiation , Chromosome Mapping , E1A-Associated p300 Protein/genetics , E1A-Associated p300 Protein/metabolism , Erythroid Cells/cytology , Erythroid Cells/metabolism , GATA1 Transcription Factor/genetics , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , In Situ Nick-End Labeling , Kruppel-Like Transcription Factors/metabolism , Liver/metabolism , Mice , Mice, Inbred BALB C , Promoter Regions, Genetic , Proto-Oncogene Proteins/genetics , Proto-Oncogene Proteins/metabolism , RNA, Messenger/metabolism , Sequence Analysis, RNA/methods , T-Cell Acute Lymphocytic Leukemia Protein 1
SELECTION OF CITATIONS
SEARCH DETAIL