RESUMO
Transcription factor (TF) binding to DNA is critical to transcription regulation. Although the binding properties of numerous individual TFs are well-documented, a more detailed comprehension of how TFs interact cooperatively with DNA is required. We present COBIND, a novel method based on non-negative matrix factorization (NMF) to identify TF co-binding patterns automatically. COBIND applies NMF to one-hot encoded regions flanking known TF binding sites (TFBSs) to pinpoint enriched DNA patterns at fixed distances. We applied COBIND to 5699 TFBS datasets from UniBind for 401 TFs in seven species. The method uncovered already established co-binding patterns and new co-binding configurations not yet reported in the literature and inferred through motif similarity and protein-protein interaction knowledge. Our extensive analyses across species revealed that 67% of the TFs shared a co-binding motif with other TFs from the same structural family. The co-binding patterns captured by COBIND are likely functionally relevant as they harbor higher evolutionarily conservation than isolated TFBSs. Open chromatin data from matching human cell lines further supported the co-binding predictions. Finally, we used single-molecule footprinting data from mouse embryonic stem cells to confirm that the COBIND-predicted co-binding events associated with some TFs likely occurred on the same DNA molecules.
Assuntos
DNA , Ligação Proteica , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Humanos , Sítios de Ligação , Animais , Camundongos , DNA/metabolismo , DNA/química , Motivos de Nucleotídeos , Cromatina/metabolismo , AlgoritmosRESUMO
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Assuntos
Bases de Dados Genéticas , Ligação Proteica , Fatores de Transcrição , Animais , Humanos , Camundongos , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Plantas/genéticaRESUMO
Since high-throughput techniques became a staple in biological science laboratories, computational algorithms, and scientific software have boomed. However, the development of bioinformatics software usually lacks software development quality standards. The resulting software code is hard to test, reuse, and maintain. We believe that the root of inefficiency in implementing the best software development practices in academic settings is the individualistic approach, which has traditionally been the norm for recognizing scientific achievements and, by extension, for developing specialized software. Software development is a collective effort in most software-heavy endeavors. Indeed, the literature suggests teamwork directly impacts code quality through knowledge sharing, collective software development, and established coding standards. In our computational biology research groups, we sustainably involve all group members in learning, sharing, and discussing software development while maintaining the personal ownership of research projects and related software products. We found that group members involved in this endeavor improved their coding skills, became more efficient bioinformaticians, and obtained detailed knowledge about their peers' work, triggering new collaborative projects. We strongly advocate for improving software development culture within bioinformatics through collective effort in computational biology groups or institutes with three or more bioinformaticians.
RESUMO
MiniPromoters, or compact promoters, are short DNA sequences that can drive expression in specific cells and tissues. While broadly useful, they are of high relevance to gene therapy due to their role in enabling precise control of where a therapeutic gene will be expressed. Here, we present OnTarget (http://ontarget.cmmt.ubc.ca), a webserver that streamlines the MiniPromoter design process. Users only need to specify a gene of interest or custom genomic coordinates on which to focus the identification of promoters and enhancers, and can also provide relevant cell-type-specific genomic evidence (e.g. accessible chromatin regions, histone modifications, etc.). OnTarget combines the provided data with internal data to identify candidate promoters and enhancers and design MiniPromoters. To illustrate the utility of OnTarget, we designed and characterized two MiniPromoters targeting different cell populations relevant to Parkinson Disease.
Assuntos
Biologia Computacional , Simulação por Computador , Regiões Promotoras Genéticas , Software , Elementos Facilitadores Genéticos/genética , Genoma , Genômica , Regiões Promotoras Genéticas/genética , Internet , Biologia Computacional/instrumentação , Biologia Computacional/métodosRESUMO
Most cancer alterations occur in the noncoding portion of the human genome, where regulatory regions control gene expression. The discovery of noncoding mutations altering the cells' regulatory programs has been limited to few examples with high recurrence or high functional impact. Here, we show that transcription factor binding sites (TFBSs) have similar mutation loads to those in protein-coding exons. By combining cancer somatic mutations in TFBSs and expression data for protein-coding and miRNA genes, we evaluate the combined effects of transcriptional and post-transcriptional alterations on the regulatory programs in cancers. The analysis of seven TCGA cohorts culminates with the identification of protein-coding and miRNA genes linked to mutations at TFBSs that are associated with a cascading trans-effect deregulation on the cells' regulatory programs. Our analyses of cis-regulatory mutations associated with miRNAs recurrently predict 12 mature miRNAs (derived from 7 precursors) associated with the deregulation of their target gene networks. The predictions are enriched for cancer-associated protein-coding and miRNA genes and highlight cis-regulatory mutations associated with the dysregulation of key pathways associated with carcinogenesis. By combining transcriptional and post-transcriptional regulation of gene expression, our method predicts cis-regulatory mutations related to the dysregulation of key gene regulatory networks in cancer patients.
Assuntos
MicroRNAs , Neoplasias , Humanos , Regulação da Expressão Gênica , Neoplasias/genética , Mutação , MicroRNAs/fisiologia , Redes Reguladoras de GenesRESUMO
JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.
Assuntos
Bases de Dados Genéticas , Genômica/classificação , Software , Fatores de Transcrição/genética , Animais , Sítios de Ligação/genética , Biologia Computacional , Genoma/genética , Humanos , Camundongos , Plantas/genética , Ligação Proteica/genética , Fatores de Transcrição/classificação , Vertebrados/genéticaRESUMO
The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation and phenotypic traits. Such relations are typically investigated by comparatively assessing genomic co-occurrence. Technically, this corresponds to assessing the similarity of pairs of genome-wide binary vectors. A variety of similarity measures have been proposed for this problem in other fields like ecology. However, while several of these measures have been employed for assessing genomic co-occurrence, their appropriateness for the genomic setting has never been investigated. We show that the choice of similarity measure may strongly influence results and propose two alternative modelling assumptions that can be used to guide this choice. On both simulated and real genomic data, the Jaccard index is strongly altered by dataset size and should be used with caution. The Forbes coefficient (fold change) and tetrachoric correlation are less influenced by dataset size, but one should be aware of increased variance for small datasets. All results on simulated and real data can be inspected and reproduced at https://hyperbrowser.uio.no/sim-measure.
Assuntos
Genômica/métodos , Algoritmos , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Variação Genética , HumanosRESUMO
MOTIVATION: Accurate motif enrichment analyses depend on the choice of background DNA sequences used, which should ideally match the sequence composition of the foreground sequences. It is important to avoid false positive enrichment due to sequence biases in the genome, such as GC-bias. Therefore, relying on an appropriate set of background sequences is crucial for enrichment analysis. RESULTS: We developed BiasAway, a command line tool and its dedicated easy-to-use web server to generate synthetic sequences matching any k-mer nucleotide composition or select genomic DNA sequences matching the mononucleotide composition of the foreground sequences through four different models. For genomic sequences, we provide precomputed partitions of genomes from nine species with five different bin sizes to generate appropriate genomic background sequences. AVAILABILITY AND IMPLEMENTATION: BiasAway source code is freely available from Bitbucket (https://bitbucket.org/CBGR/biasaway) and can be easily installed using bioconda or pip. The web server is available at https://biasaway.uio.no and a detailed documentation is available at https://biasaway.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genômica , Software , DNA/genética , Genoma , NucleotídeosRESUMO
Small non-coding RNAs have gained substantial attention due to their roles in animal development and human disorders. Among them, microRNAs are special because individual gene sequences are conserved across the animal kingdom. In addition, unique and mechanistically well understood features can clearly distinguish bona fide miRNAs from the myriad other small RNAs generated by cells. However, making this distinction is not a common practice and, thus, not surprisingly, the heterogeneous quality of available miRNA complements has become a major concern in microRNA research. We addressed this by extensively expanding our curated microRNA gene database - MirGeneDB - to 45 organisms, encompassing a wide phylogenetic swath of animal evolution. By consistently annotating and naming 10,899 microRNA genes in these organisms, we show that previous microRNA annotations contained not only many false positives, but surprisingly lacked >2000 bona fide microRNAs. Indeed, curated microRNA complements of closely related organisms are very similar and can be used to reconstruct ancestral miRNA repertoires. MirGeneDB represents a robust platform for microRNA-based research, providing deeper and more significant insights into the biology and evolution of miRNAs as well as biomedical and biomarker research. MirGeneDB is publicly and freely available at http://mirgenedb.org/.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , MicroRNAs/genética , Software , Navegador , Animais , Sequência Conservada , Evolução Molecular , MicroRNAs/classificação , Anotação de Sequência Molecular , Filogenia , Interface Usuário-ComputadorRESUMO
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
Assuntos
Sítios de Ligação , Biologia Computacional , Bases de Dados Genéticas , Software , Fatores de Transcrição , Animais , Genômica/métodos , Ligação Proteica , Fatores de Transcrição/metabolismo , Interface Usuário-Computador , NavegadorRESUMO
Small and cell-type restricted promoters are important tools for basic and preclinical research, and clinical delivery of gene therapies. In clinical gene therapy, ophthalmic trials have been leading the field, with over 50% of ocular clinical trials using promoters that restrict expression based on cell type. Here, 19 human DNA MiniPromoters were bioinformatically designed for rAAV, tested by neonatal intravenous delivery in mouse, and successful MiniPromoters went on to be tested by intravitreal, subretinal, intrastromal, and/or intravenous delivery in adult mouse. We present promoter development as an overview for each cell type, but only show results in detail for the recommended MiniPromoters: Ple265 and Ple341 (PCP2) ON bipolar, Ple349 (PDE6H) cone, Ple253 (PITX3) corneal stroma, Ple32 (CLDN5) endothelial cells of the blood-retina barrier, Ple316 (NR2E1) Müller glia, and Ple331 (PAX6) PAX6 positive. Overall, we present a resource of new, redesigned, and improved MiniPromoters for ocular gene therapy that range in size from 784 to 2484 bp, and from weaker, equal, or stronger in strength relative to the ubiquitous control promoter smCBA. All MiniPromoters will be useful for therapies involving small regulatory RNA and DNA, and proteins ranging from 517 to 1084 amino acids, representing 62.9-90.2% of human proteins.
Assuntos
Células Endoteliais , Animais , Humanos , Camundongos , Neuroglia , Fator de Transcrição PAX6/genética , Regiões Promotoras Genéticas , Retina , Células Fotorreceptoras Retinianas ConesRESUMO
BACKGROUND: Transcription factors (TFs) bind specifically to TF binding sites (TFBSs) at cis-regulatory regions to control transcription. It is critical to locate these TF-DNA interactions to understand transcriptional regulation. Efforts to predict bona fide TFBSs benefit from the availability of experimental data mapping DNA binding regions of TFs (chromatin immunoprecipitation followed by sequencing - ChIP-seq). RESULTS: In this study, we processed ~ 10,000 public ChIP-seq datasets from nine species to provide high-quality TFBS predictions. After quality control, it culminated with the prediction of ~ 56 million TFBSs with experimental and computational support for direct TF-DNA interactions for 644 TFs in > 1000 cell lines and tissues. These TFBSs were used to predict > 197,000 cis-regulatory modules representing clusters of binding events in the corresponding genomes. The high-quality of the TFBSs was reinforced by their evolutionary conservation, enrichment at active cis-regulatory regions, and capacity to predict combinatorial binding of TFs. Further, we confirmed that the cell type and tissue specificity of enhancer activity was correlated with the number of TFs with binding sites predicted in these regions. All the data is provided to the community through the UniBind database that can be accessed through its web-interface ( https://unibind.uio.no/ ), a dedicated RESTful API, and as genomic tracks. Finally, we provide an enrichment tool, available as a web-service and an R package, for users to find TFs with enriched TFBSs in a set of provided genomic regions. CONCLUSIONS: UniBind is the first resource of its kind, providing the largest collection of high-confidence direct TF-DNA interactions in nine species.
Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , DNA , Sítios de Ligação , Imunoprecipitação da Cromatina , Biologia Computacional , DNA/metabolismo , Ligação ProteicaRESUMO
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF-DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF-DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF-DNA interactions. Our work culminated with predicted interactions covering >4% of the human genome, obtained by uniformly processing 1983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF-DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. We provide this collection of direct TF-DNA interactions and cis-regulatory modules through the UniBind web-interface (http://unibind.uio.no).
Assuntos
Biologia Computacional , DNA/genética , Genoma Humano/genética , Fatores de Transcrição/genética , Algoritmos , Sítios de Ligação/genética , Imunoprecipitação da Cromatina , Mapeamento Cromossômico/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Ligação Proteica/genética , Análise de Sequência de DNA/métodosRESUMO
With this latest release of ReMap (http://remap.cisreg.eu), we present a unique collection of regulatory regions in human, as a result of a large-scale integrative analysis of ChIP-seq experiments for hundreds of transcriptional regulators (TRs) such as transcription factors, transcriptional co-activators and chromatin regulators. In 2015, we introduced the ReMap database to capture the genome regulatory space by integrating public ChIP-seq datasets, covering 237 TRs across 13 million (M) peaks. In this release, we have extended this catalog to constitute a unique collection of regulatory regions. Specifically, we have collected, analyzed and retained after quality control a total of 2829 ChIP-seq datasets available from public sources, covering a total of 485 TRs with a catalog of 80M peaks. Additionally, the updated database includes new search features for TR names as well as aliases, including cell line names and the ability to navigate the data directly within genome browsers via public track hubs. Finally, full access to this catalog is available online together with a TR binding enrichment analysis tool. ReMap 2018 provides a significant update of the ReMap database, providing an in depth view of the complexity of the regulatory landscape in human.
Assuntos
Imunoprecipitação da Cromatina , Proteínas de Ligação a DNA/metabolismo , Bases de Dados Genéticas , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Fatores de Transcrição/metabolismoRESUMO
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
Assuntos
Bases de Dados Genéticas , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação/genética , Genômica , Humanos , Internet , Plantas/genética , Plantas/metabolismo , Matrizes de Pontuação de Posição Específica , Ligação Proteica/genética , Interface Usuário-Computador , Vertebrados/genética , Vertebrados/metabolismoRESUMO
BACKGROUND: The work of the FANTOM5 Consortium has brought forth a new level of understanding of the regulation of gene transcription and the cellular processes involved in creating diversity of cell types. In this study, we extended the analysis of the FANTOM5 Cap Analysis of Gene Expression (CAGE) transcriptome data to focus on understanding the genetic regulators involved in mouse cerebellar development. RESULTS: We used the HeliScopeCAGE library sequencing on cerebellar samples over 8 embryonic and 4 early postnatal times. This study showcases temporal expression pattern changes during cerebellar development. Through a bioinformatics analysis that focused on transcription factors, their promoters and binding sites, we identified genes that appear as strong candidates for involvement in cerebellar development. We selected several candidate transcriptional regulators for validation experiments including qRT-PCR and shRNA transcript knockdown. We observed marked and reproducible developmental defects in Atf4, Rfx3, and Scrt2 knockdown embryos, which support the role of these genes in cerebellar development. CONCLUSIONS: The successful identification of these novel gene regulators in cerebellar development demonstrates that the FANTOM5 cerebellum time series is a high-quality transcriptome database for functional investigation of gene regulatory networks in cerebellar development.
Assuntos
Cerebelo/crescimento & desenvolvimento , Perfilação da Expressão Gênica , Motivos de Nucleotídeos/genética , Transcrição Gênica/genética , Fator 4 Ativador da Transcrição/deficiência , Fator 4 Ativador da Transcrição/genética , Fator 4 Ativador da Transcrição/metabolismo , Animais , Cerebelo/embriologia , Cerebelo/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Técnicas de Silenciamento de Genes , Camundongos , Camundongos Endogâmicos C57BL , Regiões Promotoras Genéticas/genética , Fatores de Transcrição de Fator Regulador X/deficiência , Fatores de Transcrição de Fator Regulador X/genética , Fatores de Transcrição de Fator Regulador X/metabolismo , Fatores de Transcrição/deficiência , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Summary: JASPAR is a widely used open-access database of curated, non-redundant transcription factor binding profiles. Currently, data from JASPAR can be retrieved as flat files or by using programming language-specific interfaces. Here, we present a programming language-independent application programming interface (API) to access JASPAR data using the Representational State Transfer (REST) architecture. The REST API enables programmatic access to JASPAR by most programming languages and returns data in eight widely used formats. Several endpoints are available to access the data and an endpoint is available to infer the TF binding profile(s) likely bound by a given DNA binding domain protein sequence. Additionally, it provides an interactive browsable interface for bioinformatics tool developers. Availability and implementation: This REST API is implemented in Python using the Django REST Framework. It is accessible at http://jaspar.genereg.net/api/ and the source code is freely available at https://bitbucket.org/CBGR/jaspar under GPL v3 license. Contact: aziz.khan@ncmm.uio.no or anthony.mathelier@ncmm.uio.no. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Linguagens de Programação , Fatores de Transcrição/metabolismo , Bases de Dados Factuais , Ligação Proteica , SoftwareRESUMO
Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Mamíferos/genética , Software , Navegador , Animais , Biologia Computacional , Humanos , Ferramenta de BuscaRESUMO
BACKGROUND: Evolutionarily conserved RFX transcription factors (TFs) regulate their target genes through a DNA sequence motif called the X-box. Thereby they regulate cellular specialization and terminal differentiation. Here, we provide a comprehensive analysis of all the eight human RFX genes (RFX1-8), their spatial and temporal expression profiles, potential upstream regulators and target genes. RESULTS: We extracted all known human RFX1-8 gene expression profiles from the FANTOM5 database derived from transcription start site (TSS) activity as captured by Cap Analysis of Gene Expression (CAGE) technology. RFX genes are broadly (RFX1-3, RFX5, RFX7) and specifically (RFX4, RFX6) expressed in different cell types, with high expression in four organ systems: immune system, gastrointestinal tract, reproductive system and nervous system. Tissue type specific expression profiles link defined RFX family members with the target gene batteries they regulate. We experimentally confirmed novel TSS locations and characterized the previously undescribed RFX8 to be lowly expressed. RFX tissue and cell type specificity arises mainly from differences in TSS architecture. RFX transcript isoforms lacking a DNA binding domain (DBD) open up new possibilities for combinatorial target gene regulation. Our results favor a new grouping of the RFX family based on protein domain composition. We uncovered and experimentally confirmed the TFs SP2 and ESR1 as upstream regulators of specific RFX genes. Using TF binding profiles from the JASPAR database, we determined relevant patterns of X-box motif positioning with respect to gene TSS locations of human RFX target genes. CONCLUSIONS: The wealth of data we provide will serve as the basis for precisely determining the roles RFX TFs play in human development and disease.
Assuntos
Regulação da Expressão Gênica , Genoma Humano , Regiões Promotoras Genéticas , Fatores de Transcrição de Fator Regulador X/genética , Sequências Reguladoras de Ácido Nucleico , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Humanos , Sítio de Iniciação de TranscriçãoRESUMO
Despite extensive progress in Huntington's disease (HD) research, very little is known about the association of epigenetic variation and HD pathogenesis in human brain tissues. Moreover, its contribution to the tissue-specific transcriptional regulation of the huntingtin gene (HTT), in which HTT expression levels are highest in brain and testes, is currently unknown. To investigate the role of DNA methylation in HD pathogenesis and tissue-specific expression of HTT, we utilized the Illumina HumanMethylation450K BeadChip array to measure DNA methylation in a cohort of age-matched HD and control human cortex and liver tissues. In cortex samples, we found minimal evidence of HD-associated DNA methylation at probed sites after correction for cell heterogeneity but did observe an association with the age of disease onset. In contrast, comparison of matched cortex and liver samples revealed tissue-specific DNA methylation of the HTT gene region at 38 sites (FDR < 0.05). Importantly, we identified a novel differentially methylated binding site in the HTT proximal promoter for the transcription factor CTCF. This CTCF site displayed increased occupancy in cortex, where HTT expression is higher, compared with the liver. Additionally, CTCF silencing reduced the activity of an HTT promoter-reporter construct, suggesting that CTCF plays a role in regulating HTT promoter function. Overall, although we were unable to detect HD-associated DNA methylation alterations at queried sites, we found that DNA methylation may be correlated to the age of disease onset in cortex tissues. Moreover, our data suggest that DNA methylation may, in part, contribute to tissue-specific HTT transcription through differential CTCF occupancy.