Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 81
Filter
1.
Genome Biol ; 25(1): 11, 2024 01 08.
Article in English | MEDLINE | ID: mdl-38191487

ABSTRACT

BACKGROUND: Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not to take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult. RESULTS: Here, we develop methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We develop Cytomod to create modified genomic sequences and we also enhance the MEME Suite, adding the capacity to handle custom alphabets. We adapt the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet. Using these methods, we identify modification-sensitive transcription factor binding motifs. We confirm established binding preferences, such as the preference of ZFP57 and C/EBPß for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. CONCLUSIONS: Using known binding preferences to tune model parameters, we discover novel modified motifs for a wide array of transcription factors. Finally, we validate our binding preference predictions for OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.


Subject(s)
Gene Expression Regulation , Transcription Factors , Epigenomics , DNA , Epigenesis, Genetic
3.
Bioinformatics ; 37(18): 2834-2840, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33760053

ABSTRACT

MOTIVATION: Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences-for example, the binding site motifs of DNA- and RNA-binding proteins. RESULTS: The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME's capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely used MEME Suite of sequence analysis tools. The name STREME stands for 'Simple, Thorough, Rapid, Enriched Motif Elicitation'. AVAILABILITY AND IMPLEMENTATION: The STREME web server and source code are provided freely for non-commercial use at http://meme-suite.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Chromatin Immunoprecipitation Sequencing , Binding Sites , Sequence Analysis, DNA , DNA , Nucleotide Motifs
4.
Bioinformatics ; 36(12): 3902-3904, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32246829

ABSTRACT

MOTIVATION: Identifying the genes regulated by a given transcription factor (TF) (its 'target genes') is a key step in developing a comprehensive understanding of gene regulation. Previously, we developed a method (CisMapper) for predicting the target genes of a TF based solely on the correlation between a histone modification at the TF's binding site and the expression of the gene across a set of tissues or cell lines. That approach is limited to organisms for which extensive histone and expression data are available, and does not explicitly incorporate the genomic distance between the TF and the gene. RESULTS: We present the T-Gene algorithm, which overcomes these limitations. It can be used to predict which genes are most likely to be regulated by a TF, and which of the TF's binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene's promoter, achieving median precision above 60%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median precision above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions. AVAILABILITY AND IMPLEMENTATION: The T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://meme-suite.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Binding Sites , Chromatin Immunoprecipitation , Gene Expression Regulation , Transcription Factors/genetics , Transcription Factors/metabolism
5.
Bioinformatics ; 35(16): 2774-2782, 2019 08 15.
Article in English | MEDLINE | ID: mdl-30596994

ABSTRACT

MOTIVATION: Post-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called 'motifs' that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation. RESULTS: We describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate P-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo re-implements the two most widely used PTM motif discovery algorithms-motif-x and MoDL-while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing 'background' peptides from an unshuffled proteome database. Our results thus suggest that many of the papers that use motif-x to find motifs may be reporting results that lack statistical support. AVAILABILITY AND IMPLEMENTATION: The MoMo web server and source code are provided at http://meme-suite.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Protein Processing, Post-Translational , Software , Algorithms , Amino Acid Motifs , Proteome , Tandem Mass Spectrometry
6.
Nucleic Acids Res ; 46(21): 11381-11395, 2018 11 30.
Article in English | MEDLINE | ID: mdl-30335167

ABSTRACT

During embryogenesis, vascular development relies on a handful of transcription factors that instruct cell fate in a distinct sub-population of the endothelium (1). The SOXF proteins that comprise SOX7, 17 and 18, are molecular switches modulating arterio-venous and lymphatic endothelial differentiation (2,3). Here, we show that, in the SOX-F family, only SOX18 has the ability to switch between a monomeric and a dimeric form. We characterized the SOX18 dimer in binding assays in vitro, and using a split-GFP reporter assay in a zebrafish model system in vivo. We show that SOX18 dimerization is driven by a novel motif located in the vicinity of the C-terminus of the DNA binding region. Insertion of this motif in a SOX7 monomer forced its assembly into a dimer. Genome-wide analysis of SOX18 binding locations on the chromatin revealed enrichment for a SOX dimer binding motif, correlating with genes with a strong endothelial signature. Using a SOX18 small molecule inhibitor that disrupts dimerization, we revealed that dimerization is important for transcription. Overall, we show that dimerization is a specific feature of SOX18 that enables the recruitment of key endothelial transcription factors, and refines the selectivity of the binding to discrete genomic locations assigned to endothelial specific genes.


Subject(s)
SOXF Transcription Factors/chemistry , Amino Acid Motifs , Animals , Biosensing Techniques , DNA-Binding Proteins/chemistry , Endothelial Cells/metabolism , Endothelium/metabolism , Gene Expression Regulation, Developmental , Green Fluorescent Proteins/chemistry , Humans , Mice , Mutation , Open Reading Frames , Protein Domains , Protein Multimerization , Zebrafish , Zebrafish Proteins/chemistry
7.
Nucleic Acids Res ; 45(11): 6572-6588, 2017 Jun 20.
Article in English | MEDLINE | ID: mdl-28541545

ABSTRACT

Krüppel-like factors (KLFs) are a family of 17 transcription factors characterized by a conserved DNA-binding domain of three zinc fingers and a variable N-terminal domain responsible for recruiting cofactors. KLFs have diverse functions in stem cell biology, embryo patterning, and tissue homoeostasis. KLF1 and related family members function as transcriptional activators via recruitment of co-activators such as EP300, whereas KLF3 and related members act as transcriptional repressors via recruitment of C-terminal Binding Proteins. KLF1 directly activates the Klf3 gene via an erythroid-specific promoter. Herein, we show KLF1 and KLF3 bind common as well as unique sites within the erythroid cell genome by ChIP-seq. We show KLF3 can displace KLF1 from key erythroid gene promoters and enhancers in vivo. Using 4sU RNA labelling and RNA-seq, we show this competition results in reciprocal transcriptional outputs for >50 important genes. Furthermore, Klf3-/- mice displayed exaggerated recovery from anemic stress and persistent cell cycling consistent with a role for KLF3 in dampening KLF1-driven proliferation. We suggest this study provides a paradigm for how KLFs work in incoherent feed-forward loops or networks to fine-tune transcription and thereby control diverse biological processes such as cell proliferation.


Subject(s)
Enhancer Elements, Genetic , Kruppel-Like Transcription Factors/metabolism , Promoter Regions, Genetic , Transcriptional Activation , Animals , Cell Line , Coculture Techniques , Erythroid Cells/metabolism , Erythropoiesis , Mice , Transcription, Genetic
8.
Nucleic Acids Res ; 45(4): e19, 2017 02 28.
Article in English | MEDLINE | ID: mdl-28204599

ABSTRACT

Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Evidence from chromatin conformation capture experiments shows that this approach is inadequate due to long-distance regulation via chromatin looping. We present CisMapper, which predicts the regulatory targets of a TF using the correlation between a histone mark at the TF's bound sites and the expression of each gene across a panel of tissues. Using both chromatin conformation capture and differential expression data, we show that CisMapper is more accurate at predicting the target genes of a TF than the distance-based approaches currently used, and is particularly advantageous for predicting the long-range regulatory interactions typical of tissue-specific gene expression. CisMapper also predicts which TF binding sites regulate a given gene more accurately than using genomic distance. Unlike distance-based methods, CisMapper can predict which transcription start site of a gene is regulated by a particular binding site of the TF.


Subject(s)
Chromatin Immunoprecipitation/methods , Regulatory Elements, Transcriptional , Sequence Analysis, DNA/methods , Software , Transcription Factors/metabolism , Algorithms , Binding Sites , Histone Code , Promoter Regions, Genetic , Transcription Initiation Site
9.
Elife ; 62017 01 31.
Article in English | MEDLINE | ID: mdl-28137359

ABSTRACT

Pharmacological targeting of transcription factors holds great promise for the development of new therapeutics, but strategies based on blockade of DNA binding, nuclear shuttling, or individual protein partner recruitment have yielded limited success to date. Transcription factors typically engage in complex interaction networks, likely masking the effects of specifically inhibiting single protein-protein interactions. Here, we used a combination of genomic, proteomic and biophysical methods to discover a suite of protein-protein interactions involving the SOX18 transcription factor, a known regulator of vascular development and disease. We describe a small-molecule that is able to disrupt a discrete subset of SOX18-dependent interactions. This compound selectively suppressed SOX18 transcriptional outputs in vitro and interfered with vascular development in zebrafish larvae. In a mouse pre-clinical model of breast cancer, treatment with this inhibitor significantly improved survival by reducing tumour vascular density and metastatic spread. Our studies validate an interactome-based molecular strategy to interfere with transcription factor activity, for the development of novel disease therapeutics.


Subject(s)
Antineoplastic Agents/metabolism , Breast Neoplasms/prevention & control , SOXF Transcription Factors/antagonists & inhibitors , Transcription, Genetic/drug effects , Animals , Biophysical Phenomena , Blood Vessels/embryology , Disease Models, Animal , Genomics , Mice , Proteomics , Treatment Outcome , Zebrafish/embryology , Zebrafish Proteins/antagonists & inhibitors
10.
Bioinformatics ; 32(8): 1217-9, 2016 04 15.
Article in English | MEDLINE | ID: mdl-26704599

ABSTRACT

UNLABELLED: Precise regulatory control of genes, particularly in eukaryotes, frequently requires the joint action of multiple sequence-specific transcription factors. A cis-regulatory module (CRM) is a genomic locus that is responsible for gene regulation and that contains multiple transcription factor binding sites in close proximity. Given a collection of known transcription factor binding motifs, many bioinformatics methods have been proposed over the past 15 years for identifying within a genomic sequence candidate CRMs consisting of clusters of those motifs. RESULTS: The MCAST algorithm uses a hidden Markov model with a P-value-based scoring scheme to identify candidate CRMs. Here, we introduce a new version of MCAST that offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs. AVAILABILITY AND IMPLEMENTATION: MCAST is part of the MEME Suite software toolkit. A web server and source code are available at http://meme-suite.org and http://alternate.meme-suite.org CONTACT: t.bailey@imb.uq.edu.au or william-noble@uw.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Binding Sites , Sequence Analysis, DNA , Genome , Humans , Regulatory Elements, Transcriptional , Software , Transcription Factors
11.
Development ; 142(21): 3746-57, 2015 Nov 01.
Article in English | MEDLINE | ID: mdl-26534986

ABSTRACT

Transcription factors act during cortical development as master regulatory genes that specify cortical arealization and cellular identities. Although numerous transcription factors have been identified as being crucial for cortical development, little is known about their downstream targets and how they mediate the emergence of specific neuronal connections via selective axon guidance. The EMX transcription factors are essential for early patterning of the cerebral cortex, but whether EMX1 mediates interhemispheric connectivity by controlling corpus callosum formation remains unclear. Here, we demonstrate that in mice on the C57Bl/6 background EMX1 plays an essential role in the midline crossing of an axonal subpopulation of the corpus callosum derived from the anterior cingulate cortex. In the absence of EMX1, cingulate axons display reduced expression of the axon guidance receptor NRP1 and form aberrant axonal bundles within the rostral corpus callosum. EMX1 also functions as a transcriptional activator of Nrp1 expression in vitro, and overexpression of this protein in Emx1 knockout mice rescues the midline-crossing phenotype. These findings reveal a novel role for the EMX1 transcription factor in establishing cortical connectivity by regulating the interhemispheric wiring of a subpopulation of neurons within the mouse anterior cingulate cortex.


Subject(s)
Gyrus Cinguli/metabolism , Homeodomain Proteins/metabolism , Neuropilin-1/metabolism , Transcription Factors/metabolism , Agenesis of Corpus Callosum/embryology , Agenesis of Corpus Callosum/genetics , Animals , Axons/metabolism , Mice, Inbred C57BL , Mice, Knockout , Semaphorins/metabolism
12.
BMC Dev Biol ; 15: 34, 2015 Oct 06.
Article in English | MEDLINE | ID: mdl-26444262

ABSTRACT

BACKGROUND: Sex determination in mammals requires expression of the Y-linked gene Sry in the bipotential genital ridges of the XY embryo. Even minor delay of the onset of Sry expression can result in XY sex reversal, highlighting the need for accurate gene regulation during sex determination. However, the location of critical regulatory elements remains unknown. Here, we analysed Sry flanking sequences across many species, using newly available genome sequences and computational tools, to better understand Sry's genomic context and to identify conserved regions predictive of functional roles. METHODS: Flanking sequences from 17 species were analysed using both global and local sequence alignment methods. Multiple motif searches were employed to characterise common motifs in otherwise unconserved sequence. RESULTS: We identified position-specific conservation of binding motifs for multiple transcription factor families, including GATA binding factors and Oct/Sox dimers. In contrast with the landscape of extremely low sequence conservation around the Sry coding region, our analysis highlighted a strongly conserved interval of ~106 bp within the Sry promoter (which we term the Sry Proximal Conserved Interval, SPCI). We further report that inverted repeats flanking murine Sry are much larger than previously recognised. CONCLUSIONS: The unusually fast pace of sequence drift on the Y chromosome sharpens the likely functional significance of both the SPCI and the identified binding motifs, providing a basis for future studies of the role(s) of these elements in Sry regulation.


Subject(s)
Mammals/genetics , Sex-Determining Region Y Protein/genetics , Animals , Base Sequence , Conserved Sequence , Evolution, Molecular , Humans , Mammals/classification , Molecular Sequence Data , Sequence Alignment , Transcription Factors/metabolism
13.
Brain Res ; 1616: 71-87, 2015 Aug 07.
Article in English | MEDLINE | ID: mdl-25960350

ABSTRACT

Nuclear factor one X (NFIX) has been shown to play a pivotal role during the development of many regions of the brain, including the neocortex, the hippocampus and the cerebellum. Mechanistically, NFIX has been shown to promote neural stem cell differentiation through the activation of astrocyte-specific genes and via the repression of genes central to progenitor cell self-renewal. Interestingly, mice lacking Nfix also exhibit other phenotypes with respect to development of the central nervous system, and whose underlying causes have yet to be determined. Here we examine one of the phenotypes displayed by Nfix(-/-) mice, namely hydrocephalus. Through the examination of embryonic and postnatal Nfix(-/-) mice we reveal that hydrocephalus is first seen at around postnatal day (P) 10 in mice lacking Nfix, and is fully penetrant by P20. Furthermore, we examined the subcommissural organ (SCO), the Sylvian aqueduct and the ependymal layer of the lateral ventricles, regions that when malformed and functionally perturbed have previously been implicated in the development of hydrocephalus. SOX3 is a factor known to regulate SCO development. Although we revealed that NFIX could repress Sox3-promoter-driven transcriptional activity in vitro, SOX3 expression within the SCO was normal within Nfix(-/-) mice, and Nfix mutant mice showed no abnormalities in the structure or function of the SCO. Moreover, these mutant mice exhibited no overt blockage of the Sylvian aqueduct. However, the ependymal layer of the lateral ventricles was frequently absent in Nfix(-/-) mice, suggesting that this phenotype may underlie the development of hydrocephalus within these knockout mice.


Subject(s)
Ependyma/pathology , Gene Expression Regulation, Developmental/genetics , Hydrocephalus/pathology , Lateral Ventricles/pathology , NFI Transcription Factors/deficiency , Age Factors , Animals , Animals, Newborn , Computational Biology , Disease Models, Animal , Embryo, Mammalian , Ependyma/embryology , Ependyma/growth & development , Hydrocephalus/genetics , Lateral Ventricles/embryology , Lateral Ventricles/growth & development , Mice , Mice, Inbred C57BL , Mice, Transgenic , NFI Transcription Factors/genetics , Nerve Tissue Proteins/genetics , Nerve Tissue Proteins/metabolism , SOXB1 Transcription Factors/genetics , SOXB1 Transcription Factors/metabolism
14.
Nucleic Acids Res ; 43(W1): W39-49, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-25953851

ABSTRACT

The MEME Suite is a powerful, integrated set of web-based tools for studying sequence motifs in proteins, DNA and RNA. Such motifs encode many biological functions, and their detection and characterization is important in the study of molecular interactions in the cell, including the regulation of gene expression. Since the previous description of the MEME Suite in the 2009 Nucleic Acids Research Web Server Issue, we have added six new tools. Here we describe the capabilities of all the tools within the suite, give advice on their best use and provide several case studies to illustrate how to combine the results of various MEME Suite tools for successful motif-based analyses. The MEME Suite is freely available for academic use at http://meme-suite.org, and source code is also available for download and local installation.


Subject(s)
Amino Acid Motifs , Nucleotide Motifs , Software , DNA/chemistry , Internet , Plasmodium falciparum , Protein Interaction Domains and Motifs , Protein Sorting Signals , Protozoan Proteins/chemistry , Receptors, Calcitriol/chemistry , Sequence Analysis, DNA , Sequence Analysis, Protein , Sequence Analysis, RNA
15.
Cereb Cortex ; 25(10): 3758-78, 2015 Oct.
Article in English | MEDLINE | ID: mdl-25331604

ABSTRACT

Transcription factors of the nuclear factor one (NFI) family play a pivotal role in the development of the nervous system. One member, NFIX, regulates the development of the neocortex, hippocampus, and cerebellum. Postnatal Nfix(-/-) mice also display abnormalities within the subventricular zone (SVZ) lining the lateral ventricles, a region of the brain comprising a neurogenic niche that provides ongoing neurogenesis throughout life. Specifically, Nfix(-/-) mice exhibit more PAX6-expressing progenitor cells within the SVZ. However, the mechanism underlying the development of this phenotype remains undefined. Here, we reveal that NFIX contributes to multiple facets of SVZ development. Postnatal Nfix(-/-) mice exhibit increased levels of proliferation within the SVZ, both in vivo and in vitro as assessed by a neurosphere assay. Furthermore, we show that the migration of SVZ-derived neuroblasts to the olfactory bulb is impaired, and that the olfactory bulbs of postnatal Nfix(-/-) mice are smaller. We also demonstrate that gliogenesis within the rostral migratory stream is delayed in the absence of Nfix, and reveal that Gdnf (glial-derived neurotrophic factor), a known attractant for SVZ-derived neuroblasts, is a target for transcriptional activation by NFIX. Collectively, these findings suggest that NFIX regulates both proliferation and migration during the development of the SVZ neurogenic niche.


Subject(s)
Cell Movement , Cell Proliferation , Lateral Ventricles/embryology , NFI Transcription Factors/physiology , Neural Stem Cells/physiology , Neurogenesis , Animals , Female , Glial Cell Line-Derived Neurotrophic Factor/metabolism , Interneurons/physiology , Lateral Ventricles/metabolism , Male , Mice , Mice, Inbred C57BL , Mice, Knockout , NFI Transcription Factors/genetics , NFI Transcription Factors/metabolism , Neuroglia/physiology , Olfactory Bulb/embryology , Olfactory Bulb/metabolism , Stem Cell Niche
16.
BMC Genomics ; 15: 752, 2014 Sep 02.
Article in English | MEDLINE | ID: mdl-25179504

ABSTRACT

BACKGROUND: Motif enrichment analysis of transcription factor ChIP-seq data can help identify transcription factors that cooperate or compete. Previously, little attention has been given to comparative motif enrichment analysis of pairs of ChIP-seq experiments, where the binding of the same transcription factor is assayed under different conditions. Such comparative analysis could potentially identify the distinct regulatory partners/competitors of the assayed transcription factor under different conditions or at different stages of development. RESULTS: We describe a new methodology for identifying sequence motifs that are differentially enriched in one set of DNA or RNA sequences relative to another set, and apply it to paired ChIP-seq experiments. We show that, using paired ChIP-seq data for a single transcription factor, differential motif enrichment analysis identifies all the known key transcription factors involved in the transformation of non-cancerous immortalized breast cells (MCF10A-ER-Src cells) into cancer stem cells whereas non-differential motif enrichment analysis does not. We also show that differential motif enrichment analysis identifies regulatory motifs that are significantly enriched at constrained locations within the bound promoters, and that these motifs are not identified by non-differential motif enrichment analysis. Our methodology differs from other approaches in that it leverages both comparative enrichment and positional enrichment of motifs in ChIP-seq peak regions or in the promoters of genes bound by the transcription factor. CONCLUSIONS: We show that differential motif enrichment analysis of paired ChIP-seq experiments offers biological insights not available from non-differential analysis. In contrast to previous approaches, our method detects motifs that are enriched in a constrained region in one set of sequences, but not enriched in the same region in the comparative set. We have enhanced the web-based CentriMo algorithm to allow it to perform the constrained differential motif enrichment analysis described in this paper, and CentriMo's on-line interface (http://meme.ebi.edu.au) provides dozens of databases of DNA- and RNA-binding motifs from a full range of organisms. All data and output files presented here are available at http://research.imb.uq.edu.au/t.bailey/supplementary\_data/Lesluyes2014.


Subject(s)
Chromatin Immunoprecipitation , Computational Biology/methods , High-Throughput Nucleotide Sequencing , Nucleotide Motifs , Binding Sites , Cell Line , Humans , Position-Specific Scoring Matrices , Promoter Regions, Genetic , Protein Binding , Tamoxifen/pharmacology , Time Factors , Transcription Factors/metabolism
17.
Nucleic Acids Res ; 42(17): 11000-10, 2014.
Article in English | MEDLINE | ID: mdl-25200088

ABSTRACT

Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules-CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for 'other' tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a 'nearest neighbor' heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps.


Subject(s)
Gene Expression Regulation , Models, Genetic , Cell Line , Genomics/methods , Histones/analysis , Humans , Linear Models , Organ Specificity , Transcription Factors/metabolism , Transcription Initiation Site
18.
Bioinformatics ; 30(18): 2673-5, 2014 Sep 15.
Article in English | MEDLINE | ID: mdl-24860161

ABSTRACT

UNLABELLED: A number of technologies, including CRISPR/Cas, transcription activator-like effector nucleases and zinc-finger nucleases, allow the user to target a chosen locus for genome editing or regulatory interference. Specificity, however, is a major problem, and the targeted locus must be chosen with care to avoid inadvertently affecting other loci ('off-targets') in the genome. To address this we have created 'Genome Target Scan' (GT-Scan), a flexible web-based tool that ranks all potential targets in a user-selected region of a genome in terms of how many off-targets they have. GT-Scan gives the user flexibility to define the desired characteristics of targets and off-targets via a simple 'target rule', and its interactive output allows detailed inspection of each of the most promising candidate targets. GT-Scan can be used to identify optimal targets for CRISPR/Cas systems, but its flexibility gives it potential to be adapted to other genome-targeting technologies as well. AVAILABILITY AND IMPLEMENTATION: GT-Scan can be run via the web at: http://gt-scan.braembl.org.au.


Subject(s)
Computational Biology/methods , Genetic Engineering/methods , Genome, Human/genetics , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Endonucleases/metabolism , Humans , Internet
19.
Development ; 141(11): 2195-205, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24866114

ABSTRACT

Mammalian sex determination hinges on the development of ovaries or testes, with testis fate being triggered by the expression of the transcription factor sex-determining region Y (Sry). Reduced or delayed Sry expression impairs testis development, highlighting the importance of its accurate spatiotemporal regulation and implying a potential role for SRY dysregulation in human intersex disorders. Several epigenetic modifiers, transcription factors and kinases are implicated in regulating Sry transcription, but it remains unclear whether or how this farrago of factors acts co-ordinately. Here we review our current understanding of Sry regulation and provide a model that assembles all known regulators into three modules, each converging on a single transcription factor that binds to the Sry promoter. We also discuss potential future avenues for discovering the cis-elements and trans-factors required for Sry regulation.


Subject(s)
Gene Expression Regulation, Developmental , Ovary/embryology , Sex-Determining Region Y Protein/physiology , Testis/embryology , Animals , Cell Lineage , Epigenesis, Genetic , Female , GATA4 Transcription Factor/metabolism , Humans , Male , Mice , Promoter Regions, Genetic , Steroidogenic Factor 1/metabolism , Transcription, Genetic , WT1 Proteins/metabolism , Y Chromosome
20.
Nat Protoc ; 9(6): 1428-50, 2014.
Article in English | MEDLINE | ID: mdl-24853928

ABSTRACT

MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by CLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix-based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP's interactive HTML output groups and aligns significant motifs to ease interpretation. This protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods.


Subject(s)
Algorithms , Binding Sites/genetics , Chromatin Immunoprecipitation/methods , High-Throughput Nucleotide Sequencing/methods , Nucleotide Motifs/genetics , Software , Cluster Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...