RESUMEN
We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.
Asunto(s)
Bases de Datos Genéticas , Regulación de la Expresión Génica , Dominios y Motivos de Interacción de Proteínas , Factores de Transcripción , Animales , Humanos , Ratones , Sitios de Unión/genética , Motivos de Nucleótidos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Internet , Dominios y Motivos de Interacción de Proteínas/genéticaRESUMEN
Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https://github.com/lab-medvedeva/SAPIEnS. SAPIEnS database is available at: https://sapiensdb.com.
Asunto(s)
Epigénesis Genética , Genómica , Genómica/métodos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Regulación de la Expresión Génica , Análisis por ConglomeradosRESUMEN
We present an update of EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets, and products which is openly accessible at http://epifactors.autosome.org. An updated version of the EpiFactors contains information on 902 proteins, including 101 histones and protamines, and, as a main update, a newly curated collection of 124 lncRNAs involved in epigenetic regulation. The amount of publications concerning the role of lncRNA in epigenetics is rapidly growing. Yet, the resource that compiles, integrates, organizes, and presents curated information on lncRNAs in epigenetics is missing. EpiFactors fills this gap and provides data on epigenetic regulators in an accessible and user-friendly form. For 820 of the genes in EpiFactors, we include expression estimates across multiple cell types assessed by CAGE-Seq in the FANTOM5 project. In addition, the updated EpiFactors contains information on 73 protein complexes involved in epigenetic regulation. Our resource is practical for a wide range of users, including biologists, bioinformaticians and molecular/systems biologists.
Asunto(s)
Bases de Datos Genéticas , Epigénesis Genética , Humanos , Histonas/genética , Histonas/metabolismo , Protaminas , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismoRESUMEN
Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
Asunto(s)
Bases de Datos Genéticas , ARN Largo no Codificante/química , ARN Largo no Codificante/genética , Transcriptoma/genética , Células Cultivadas , Secuencia Conservada/genética , Conjuntos de Datos como Asunto , Elementos de Facilitación Genéticos/genética , Epigénesis Genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Internet , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas/genética , Sitios de Carácter Cuantitativo/genética , Estabilidad del ARN , ARN Mensajero/genéticaRESUMEN
Differential methylation (DM) is actively recruited in different types of fundamental and translational studies. Currently, microarray- and NGS-based approaches for methylation analysis are the most widely used with multiple statistical models designed to extract differential methylation signatures. The benchmarking of DM models is challenging due to the absence of gold standard data. In this study, we analyze an extensive number of publicly available NGS and microarray datasets with divergent and widely utilized statistical models and apply the recently suggested and validated rank-statistic-based approach Hobotnica to evaluate the quality of their results. Overall, microarray-based methods demonstrate more robust and convergent results, while NGS-based models are highly dissimilar. Tests on the simulated NGS data tend to overestimate the quality of the DM methods and therefore are recommended for use with caution. Evaluation of the top 10 DMC and top 100 DMC in addition to the not-subset signature also shows more stable results for microarray data. Summing up, given the observed heterogeneity in NGS methylation data, the evaluation of newly generated methylation signatures is a crucial step in DM analysis. The Hobotnica metric is coordinated with previously developed quality metrics and provides a robust, sensitive, and informative estimation of methods' performance and DM signatures' quality in the absence of gold standard data solving a long-existing problem in DM analysis.
Asunto(s)
Metilación de ADN , Modelos Estadísticos , Análisis por MicromatricesRESUMEN
Single-cell RNA-seq data contains a lot of dropouts hampering downstream analyses due to the low number and inefficient capture of mRNAs in individual cells. Here, we present Epi-Impute, a computational method for dropout imputation by reconciling expression and epigenomic data. Epi-Impute leverages single-cell ATAC-seq data as an additional source of information about gene activity to reduce the number of dropouts. We demonstrate that Epi-Impute outperforms existing methods, especially for very sparse single-cell RNA-seq data sets, significantly reducing imputation error. At the same time, Epi-Impute accurately captures the primary distribution of gene expression across cells while preserving the gene-gene and cell-cell relationship in the data. Moreover, Epi-Impute allows for the discovery of functionally relevant cell clusters as a result of the increased resolution of scRNA-seq data due to imputation.
Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Perfilación de la Expresión GénicaRESUMEN
Acute myeloid leukemia (AML) is a rapidly progressing heterogeneous disease with a high mortality rate, which is characterized by hyperproliferation of atypical immature myeloid cells. The number of AML patients is expected to increase in the near future, due to the old-age-associated nature of AML and increased longevity in the human population. RUNX1 and CEBPA, key transcription factors (TFs) of hematopoiesis, are frequently and independently mutated in AML. RUNX1 and CEBPA can bind TET2 demethylase and attract it to their binding sites (TFBS) in cell lines, leading to DNA demethylation of the regions nearby. Since TET2 does not have a DNA-binding domain, TFs are crucial for its guidance to target genomic locations. In this paper, we show that RUNX1 and CEBPA mutations in AML patients affect the methylation of important regulatory sites that resulted in the silencing of several RUNX1 and CEBPA target genes, most likely in a TET2-dependent manner. We demonstrated that hypermethylation of TFBS in AML cells with RUNX1 mutations was associated with resistance to anticancer chemotherapy. Demethylation therapy restored expression of the RUNX1 target gene, BIK, and increased sensitivity of AML cells to chemotherapy. If our results are confirmed, mutations in RUNX1 could be an indication for prescribing the combination of cytotoxic and demethylation therapies.
Asunto(s)
Proteínas Potenciadoras de Unión a CCAAT , Subunidad alfa 2 del Factor de Unión al Sitio Principal , Leucemia Mieloide Aguda , Proteínas Potenciadoras de Unión a CCAAT/genética , Proteínas Potenciadoras de Unión a CCAAT/metabolismo , Subunidad alfa 2 del Factor de Unión al Sitio Principal/genética , Subunidad alfa 2 del Factor de Unión al Sitio Principal/metabolismo , ADN/genética , ADN/metabolismo , Metilación de ADN/genética , Desmetilación/efectos de los fármacos , Humanos , Leucemia Mieloide Aguda/tratamiento farmacológico , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/metabolismo , MutaciónRESUMEN
The genomes of mammalian species are pervasively transcribed producing as many noncoding as protein-coding RNAs. There is a growing body of evidence supporting their functional role. Long noncoding RNA (lncRNA) can bind both nucleic acids and proteins through several mechanisms. A reliable computational prediction of the most probable mechanism of lncRNA interaction can facilitate experimental validation of its function. In this study, we benchmarked computational tools capable to discriminate lncRNA from mRNA and predict lncRNA interactions with other nucleic acids. We assessed the performance of 9 tools for distinguishing protein-coding from noncoding RNAs, as well as 19 tools for prediction of RNA-RNA and RNA-DNA interactions. Our conclusions about the considered tools were based on their performances on the entire genome/transcriptome level, as it is the most common task nowadays. We found that FEELnc and CPAT distinguish between coding and noncoding mammalian transcripts in the most accurate manner. ASSA, RIBlast and LASTAL, as well as Triplexator, turned out to be the best predictors of RNA-RNA and RNA-DNA interactions, respectively. We showed that the normalization of the predicted interaction strength to the transcript length and GC content may improve the accuracy of inferring RNA interactions. Yet, all the current tools have difficulties to make accurate predictions of short-trans RNA-RNA interactions-stretches of sparse contacts. All over, there is still room for improvement in each category, especially for predictions of RNA interactions.
Asunto(s)
Benchmarking , Biología Computacional/métodos , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo , Humanos , ARN Largo no Codificante/genética , ARN Mensajero/genética , TranscriptomaRESUMEN
Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
Asunto(s)
Atlas como Asunto , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Transcriptoma/genética , Animales , Línea Celular , Células Cultivadas , Análisis por Conglomerados , Secuencia Conservada/genética , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Genes Esenciales/genética , Genoma/genética , Humanos , Ratones , Sistemas de Lectura Abierta/genética , Especificidad de Órganos , ARN Mensajero/análisis , ARN Mensajero/genética , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Transcripción Genética/genéticaRESUMEN
Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.
Asunto(s)
Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo , Sitios de Unión , Inmunoprecipitación de Cromatina , ADN/química , ADN/metabolismo , Humanos , Aprendizaje Automático , Posición Específica de Matrices de PuntuaciónRESUMEN
We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
Asunto(s)
Bases de Datos Genéticas , Factores de Transcripción/metabolismo , Animales , Sitios de Unión/genética , Inmunoprecipitación de Cromatina , Humanos , Ratones , Modelos Genéticos , Motivos de Nucleótidos , Análisis de Secuencia de ADNRESUMEN
Long noncoding RNAs (lncRNAs) play a key role in many cellular processes including chromatin regulation. To modify chromatin, lncRNAs often interact with DNA in a sequence-specific manner forming RNA:DNA triple helices. Computational tools for triple helix search do not always provide genome-wide predictions of sufficient quality. Here, we used four human lncRNAs (MEG3, DACOR1, TERC and HOTAIR) and their experimentally determined binding regions for evaluating triplex parameters that provide the highest prediction accuracy. Additionally, we combined triplex prediction with the lncRNA secondary structure and demonstrated that considering only single-stranded fragments of lncRNA can further improve DNA-RNA triplexes prediction.
Asunto(s)
Biología Computacional/métodos , ADN/metabolismo , ARN Largo no Codificante/química , ARN Largo no Codificante/metabolismo , Sitios de Unión , Humanos , Modelos Moleculares , Conformación de Ácido Nucleico , ARN/química , ARN/metabolismo , Telomerasa/química , Telomerasa/metabolismoRESUMEN
BACKGROUND: DNA methylation is involved in the regulation of gene expression. Although bisulfite-sequencing based methods profile DNA methylation at a single CpG resolution, methylation levels are usually averaged over genomic regions in the downstream bioinformatic analysis. RESULTS: We demonstrate that on the genome level a single CpG methylation can serve as a more accurate predictor of gene expression than an average promoter / gene body methylation. We define CpG traffic lights (CpG TL) as CpG dinucleotides with a significant correlation between methylation and expression of a gene nearby. CpG TL are enriched in all regulatory regions. Among all promoters, CpG TL are especially enriched in poised ones, suggesting involvement of DNA methylation in their regulation. Yet, binding of only a handful of transcription factors, such as NRF1, ETS, STAT and IRF-family members, could be regulated by direct methylation of transcription factor binding sites (TFBS) or its close proximity. For the majority of TF, an alternative scenario is more likely: methylation and inactivation of the whole regulatory element indirectly represses functional TF binding with a CpG TL being a reliable marker of such inactivation. CONCLUSIONS: CpG TL provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to gene expression. CpG TL methylation can be used as reliable markers of enhancer activity and gene expression in applications, e.g. in clinic where measuring DNA methylation is easier compared to directly measuring gene expression due to more stable nature of DNA.
Asunto(s)
Islas de CpG , Metilación de ADN , Regulación de la Expresión Génica , Genoma Humano , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/metabolismo , Humanos , Regiones Promotoras Genéticas , Factores de Transcripción/genética , Transcripción GenéticaRESUMEN
Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Mamíferos/genética , Programas Informáticos , Navegador Web , Animales , Biología Computacional , Humanos , Motor de BúsquedaRESUMEN
The three-spined stickleback (Gasterosteus aculeatus) represents a convenient model to study microevolution-adaptation to a freshwater environment. Although genetic adaptations to freshwater environments are well-studied, epigenetic adaptations have attracted little attention. In this work, we investigated the role of DNA methylation in the adaptation of the marine stickleback population to freshwater conditions. DNA methylation profiling was performed in marine and freshwater populations of sticklebacks, as well as in marine sticklebacks placed into a freshwater environment and freshwater sticklebacks placed into seawater. We showed that the DNA methylation profile after placing a marine stickleback into fresh water partially converged to that of a freshwater stickleback. For six genes including ATP4A ion pump and NELL1, believed to be involved in skeletal ossification, we demonstrated similar changes in DNA methylation in both evolutionary and short-term adaptation. This suggested that an immediate epigenetic response to freshwater conditions can be maintained in freshwater population. Interestingly, we observed enhanced epigenetic plasticity in freshwater sticklebacks that may serve as a compensatory regulatory mechanism for the lack of genetic variation in the freshwater population. For the first time, we demonstrated that genes encoding ion channels KCND3, CACNA1FB, and ATP4A were differentially methylated between the marine and the freshwater populations. Other genes encoding ion channels were previously reported to be under selection in freshwater populations. Nevertheless, the genes that harbor genetic and epigenetic changes were not the same, suggesting that epigenetic adaptation is a complementary mechanism to selection of genetic variants favorable for freshwater environment.
Asunto(s)
Adaptación Fisiológica/genética , Epigénesis Genética/genética , Smegmamorpha/genética , Aclimatación/genética , Amilopectina , Animales , Evolución Biológica , Metilación de ADN/genética , Evolución Molecular , Agua Dulce , Variación Genética/genética , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Agua de Mar , Selección Genética/genéticaRESUMEN
Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
Asunto(s)
Bases de Datos Genéticas , Elementos Reguladores de la Transcripción , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Inmunoprecipitación de Cromatina , Humanos , Ratones , Modelos Biológicos , Análisis de Secuencia de ADNRESUMEN
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
Asunto(s)
Bases de Datos Genéticas , Elementos Reguladores de la Transcripción , Factores de Transcripción/metabolismo , Sitios de Unión , Humanos , Internet , Modelos Genéticos , Posición Específica de Matrices de PuntuaciónRESUMEN
BACKGROUND: DNA methylation in promoters is closely linked to downstream gene repression. However, whether DNA methylation is a cause or a consequence of gene repression remains an open question. If it is a cause, then DNA methylation may affect the affinity of transcription factors (TFs) for their binding sites (TFBSs). If it is a consequence, then gene repression caused by chromatin modification may be stabilized by DNA methylation. Until now, these two possibilities have been supported only by non-systematic evidence and they have not been tested on a wide range of TFs. An average promoter methylation is usually used in studies, whereas recent results suggested that methylation of individual cytosines can also be important. RESULTS: We found that the methylation profiles of 16.6% of cytosines and the expression profiles of neighboring transcriptional start sites (TSSs) were significantly negatively correlated. We called the CpGs corresponding to such cytosines "traffic lights". We observed a strong selection against CpG "traffic lights" within TFBSs. The negative selection was stronger for transcriptional repressors as compared with transcriptional activators or multifunctional TFs as well as for core TFBS positions as compared with flanking TFBS positions. CONCLUSIONS: Our results indicate that direct and selective methylation of certain TFBS that prevents TF binding is restricted to special cases and cannot be considered as a general regulatory mechanism of transcription.
Asunto(s)
Citosina/metabolismo , Metilación de ADN , Factores de Transcripción/metabolismo , Algoritmos , Sitios de Unión , Biología Computacional , Islas de CpG , Citosina/química , Humanos , Regiones Promotoras Genéticas , Factores de Transcripción/química , Sitio de Iniciación de la TranscripciónRESUMEN
Long non-coding RNAs (lncRNAs) play an important role in genome regulation. Specifically, many lncRNAs interact with chromatin, recruit epigenetic complexes and in this way affect large-scale gene expression programs. However, the experimental data about lncRNA-chromatin interactions is still limited. The majority of experimental protocols do not provide any insight into the mechanics of lncRNA-based genome-wide epigenetic regulation. Here we present the HiMoRNA (Histone-Modifying RNA) database, a resource containing correlated lncRNA-epigenetic changes in specific genomic locations genome-wide. HiMoRNA integrates a large amount of multi-omics data to characterize the effects of lncRNA on epigenetic modifications and gene expression. The current release of HiMoRNA includes more than five million associations in humans for ten histone modifications in multiple genomic loci and 4145 lncRNAs. HiMoRNA provides a user-friendly interface to facilitate browsing, searching and retrieving of lncRNAs associated with epigenetic profiles of various chromatin loci. Analysis of the HiMoRNA data suggests that several lncRNA including JPX might be involved not only in regulation of XIST locus but also in direct establishment or maintenance of X-chromosome inactivation. We believe that HiMoRNA is a convenient and valuable resource that can provide valuable biological insights and greatly facilitate functional annotation of lncRNAs.
RESUMEN
Background: Acute myeloid leukemia (AML) is a hematopoietic malignancy characterized by genetic and epigenetic aberrations that alter the differentiation capacity of myeloid progenitor cells. The transcription factor CEBPα is frequently mutated in AML patients leading to an increase in DNA methylation in many genomic locations. Previously, it has been shown that ecCEBPα (extra coding CEBP α) - a lncRNA transcribed in the same direction as CEBPα gene - regulates DNA methylation of CEBPα promoter in cis. Here, we hypothesize that ecCEBPα could participate in the regulation of DNA methylation in trans. Method: First, we retrieved the methylation profile of AML patients with mutated CEBPα locus from The Cancer Genome Atlas (TCGA). We then predicted the ecCEBPα secondary structure in order to check the potential of ecCEBPα to form triplexes around CpG loci and checked if triplex formation influenced CpG methylation, genome-wide. Results: Using DNA methylation profiles of AML patients with a mutated CEBPα locus, we show that ecCEBPα could interact with DNA by forming DNA:RNA triple helices and protect regions near its binding sites from global DNA methylation. Further analysis revealed that triplex-forming oligonucleotides in ecCEBPα are structurally unpaired supporting the DNA-binding potential of these regions. ecCEBPα triplexes supported with the RNA-chromatin co-localization data are located in the promoters of leukemia-linked transcriptional factors such as MLF2. Discussion: Overall, these results suggest a novel regulatory mechanism for ecCEBPα as a genome-wide epigenetic modulator through triple-helix formation which may provide a foundation for sequence-specific engineering of RNA for regulating methylation of specific genes.