RESUMO
Transcription factor (TF) binding to DNA is critical to transcription regulation. Although the binding properties of numerous individual TFs are well-documented, a more detailed comprehension of how TFs interact cooperatively with DNA is required. We present COBIND, a novel method based on non-negative matrix factorization (NMF) to identify TF co-binding patterns automatically. COBIND applies NMF to one-hot encoded regions flanking known TF binding sites (TFBSs) to pinpoint enriched DNA patterns at fixed distances. We applied COBIND to 5699 TFBS datasets from UniBind for 401 TFs in seven species. The method uncovered already established co-binding patterns and new co-binding configurations not yet reported in the literature and inferred through motif similarity and protein-protein interaction knowledge. Our extensive analyses across species revealed that 67% of the TFs shared a co-binding motif with other TFs from the same structural family. The co-binding patterns captured by COBIND are likely functionally relevant as they harbor higher evolutionarily conservation than isolated TFBSs. Open chromatin data from matching human cell lines further supported the co-binding predictions. Finally, we used single-molecule footprinting data from mouse embryonic stem cells to confirm that the COBIND-predicted co-binding events associated with some TFs likely occurred on the same DNA molecules.
Assuntos
DNA , Ligação Proteica , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Humanos , Sítios de Ligação , Animais , Camundongos , DNA/metabolismo , DNA/química , Motivos de Nucleotídeos , Cromatina/metabolismo , AlgoritmosRESUMO
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Assuntos
Bases de Dados Genéticas , Ligação Proteica , Fatores de Transcrição , Animais , Humanos , Camundongos , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Plantas/genéticaRESUMO
Most cancer alterations occur in the noncoding portion of the human genome, where regulatory regions control gene expression. The discovery of noncoding mutations altering the cells' regulatory programs has been limited to few examples with high recurrence or high functional impact. Here, we show that transcription factor binding sites (TFBSs) have similar mutation loads to those in protein-coding exons. By combining cancer somatic mutations in TFBSs and expression data for protein-coding and miRNA genes, we evaluate the combined effects of transcriptional and post-transcriptional alterations on the regulatory programs in cancers. The analysis of seven TCGA cohorts culminates with the identification of protein-coding and miRNA genes linked to mutations at TFBSs that are associated with a cascading trans-effect deregulation on the cells' regulatory programs. Our analyses of cis-regulatory mutations associated with miRNAs recurrently predict 12 mature miRNAs (derived from 7 precursors) associated with the deregulation of their target gene networks. The predictions are enriched for cancer-associated protein-coding and miRNA genes and highlight cis-regulatory mutations associated with the dysregulation of key pathways associated with carcinogenesis. By combining transcriptional and post-transcriptional regulation of gene expression, our method predicts cis-regulatory mutations related to the dysregulation of key gene regulatory networks in cancer patients.
Assuntos
MicroRNAs , Neoplasias , Humanos , Regulação da Expressão Gênica , Neoplasias/genética , Mutação , MicroRNAs/fisiologia , Redes Reguladoras de GenesRESUMO
JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.
Assuntos
Bases de Dados Genéticas , Genômica/classificação , Software , Fatores de Transcrição/genética , Animais , Sítios de Ligação/genética , Biologia Computacional , Genoma/genética , Humanos , Camundongos , Plantas/genética , Ligação Proteica/genética , Fatores de Transcrição/classificação , Vertebrados/genéticaRESUMO
RSAT (Regulatory Sequence Analysis Tools) enables the detection and the analysis of cis-regulatory elements in genomic sequences. This software suite performs (i) de novo motif discovery (including from genome-wide datasets like ChIP-seq/ATAC-seq) (ii) genomic sequences scanning with known motifs, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations and (v) comparative genomics. RSAT comprises 50 tools. Six public Web servers (including a teaching server) are offered to meet the needs of different biological communities. RSAT philosophy and originality are: (i) a multi-modal access depending on the user needs, through web forms, command-line for local installation and programmatic web services, (ii) a support for virtually any genome (animals, bacteria, plants, totalizing over 10 000 genomes directly accessible). Since the 2018 NAR Web Software Issue, we have developed a large REST API, extended the support for additional genomes and external motif collections, enhanced some tools and Web forms, and developed a novel tool that builds or refine gene regulatory networks using motif scanning (network-interactions). The RSAT website provides extensive documentation, tutorials and published protocols. RSAT code is under open-source license and now hosted in GitHub. RSAT is available at http://www.rsat.eu/.
Assuntos
Genômica , Fatores de Transcrição , Animais , Fatores de Transcrição/genética , Genômica/métodos , Software , Análise de Sequência de DNA/métodos , Redes Reguladoras de GenesRESUMO
The identification of functional elements encoded in plant genomes is necessary to understand gene regulation. Although much attention has been paid to model species like Arabidopsis (Arabidopsis thaliana), little is known about regulatory motifs in other plants. Here, we describe a bottom-up approach for de novo motif discovery using peach (Prunus persica) as an example. These predictions require pre-computed gene clusters grouped by their expression similarity. After optimizing the boundaries of proximal promoter regions, two motif discovery algorithms from RSAT::Plants (http://plants.rsat.eu) were tested (oligo and dyad analysis). Overall, 18 out of 45 co-expressed modules were enriched in motifs typical of well-known transcription factor (TF) families (bHLH, bZip, BZR, CAMTA, DOF, E2FE, AP2-ERF, Myb-like, NAC, TCP, and WRKY) and a few uncharacterized motifs. Our results indicate that small modules and promoter window of [-500 bp, +200 bp] relative to the transcription start site (TSS) maximize the number of motifs found and reduce low-complexity signals in peach. The distribution of discovered regulatory sites was unbalanced, as they accumulated around the TSS. This approach was benchmarked by testing two different expression-based clustering algorithms (network-based and hierarchical) and, as control, genes grouped for harboring ChIPseq peaks of the same Arabidopsis TF. The method was also verified on maize (Zea mays), a species with a large genome. In summary, this article presents a glimpse of the peach regulatory components at genome scale and provides a general protocol that can be applied to other species. A Docker software container is released to facilitate the reproduction of these analyses.
Assuntos
Regiões Promotoras Genéticas/genética , Prunus persica/genética , Algoritmos , Arabidopsis/genética , Biologia Computacional , Regulação da Expressão Gênica de Plantas/genética , Regulação da Expressão Gênica de Plantas/fisiologia , Família Multigênica/genética , Família Multigênica/fisiologia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
Assuntos
Sítios de Ligação , Biologia Computacional , Bases de Dados Genéticas , Software , Fatores de Transcrição , Animais , Genômica/métodos , Ligação Proteica , Fatores de Transcrição/metabolismo , Interface Usuário-Computador , NavegadorRESUMO
RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Assuntos
Sequências Reguladoras de Ácido Nucleico , Software , Variação Genética , Genômica/história , Sequenciamento de Nucleotídeos em Larga Escala/história , História do Século XX , História do Século XXI , Internet , Motivos de Nucleotídeos , Software/históriaRESUMO
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
Assuntos
Bases de Dados Genéticas , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação/genética , Genômica , Humanos , Internet , Plantas/genética , Plantas/metabolismo , Matrizes de Pontuação de Posição Específica , Ligação Proteica/genética , Interface Usuário-Computador , Vertebrados/genética , Vertebrados/metabolismoRESUMO
Rhizobium etli CE3 grown in succinate-ammonium minimal medium (MM) excreted outer membrane vesicles (OMVs) with diameters of 40 to 100 nm. Proteins from the OMVs and the periplasmic space were isolated from 6 and 24 h cultures and identified by proteome analysis. A total of 770 proteins were identified: 73.8 and 21.3â% of these occurred only in the periplasm and OMVs, respectively, and only 4.9â% were found in both locations. The majority of proteins found in either location were present only at 6 or 24 h: in the periplasm and OMVs, only 24 and 9â% of proteins, respectively, were present at both sampling times, indicating a time-dependent differential sorting of proteins into the two compartments. The OMVs contained proteins with physiologically varied roles, including Rhizobium adhering proteins (Rap), polysaccharidases, polysaccharide export proteins, auto-aggregation and adherence proteins, glycosyl transferases, peptidoglycan binding and cross-linking enzymes, potential cell wall-modifying enzymes, porins, multidrug efflux RND family proteins, ABC transporter proteins and heat shock proteins. As expected, proteins with known periplasmic localizations (phosphatases, phosphodiesterases, pyrophosphatases) were found only in the periplasm, along with numerous proteins involved in amino acid and carbohydrate metabolism and transport. Nearly one-quarter of the proteins present in the OMVs were also found in our previous analysis of the R. etli total exproteome of MM-grown cells, indicating that these nanoparticles are an important mechanism for protein excretion in this species.
Assuntos
Proteínas de Bactérias/metabolismo , Vesículas Extracelulares/metabolismo , Periplasma/metabolismo , Rhizobium etli/crescimento & desenvolvimento , Meios de Cultura/química , Proteoma , Rhizobium etli/metabolismoRESUMO
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Assuntos
Elementos Reguladores de Transcrição , Software , Sítios de Ligação , Variação Genética , Genômica , Humanos , Internet , Motivos de Nucleotídeos , Fatores de Transcrição/metabolismoRESUMO
RhizoBindingSites is a de novo depurified database of conserved DNA motifs potentially involved in the transcriptional regulation of the Rhizobium, Sinorhizobium, Bradyrhizobium, Azorhizobium, and Mesorhizobium genera covering 9 representative symbiotic species, deduced from the upstream regulatory sequences of orthologous genes (O-matrices) from the Rhizobiales taxon. The sites collected with O-matrices per gene per genome from RhizoBindingSites were used to deduce matrices using the dyad-Regulatory Sequence Analysis Tool (RSAT) method, giving rise to novel S-matrices for the construction of the RizoBindingSites v2.0 database. A comparison of the S-matrix logos showed a greater frequency and/or re-definition of specific-position nucleotides found in the O-matrices. Moreover, S-matrices were better at detecting genes in the genome, and there was a more significant number of transcription factors (TFs) in the vicinity than O-matrices, corresponding to a more significant genomic coverage for S-matrices. O-matrices of 3187 TFs and S-matrices of 2754 TFs from 9 species were deposited in RhizoBindingSites and RhizoBindingSites v2.0, respectively. The homology between the matrices of TFs from a genome showed inter-regulation between the clustered TFs. In addition, matrices of AraC, ArsR, GntR, and LysR ortholog TFs showed different motifs, suggesting distinct regulation. Benchmarking showed 72%, 68%, and 81% of common genes per regulon for O-matrices and approximately 14% less common genes with S-matrices of Rhizobium etli CFN42, Rhizobium leguminosarum bv. viciae 3841, and Sinorhizobium meliloti 1021. These data were deposited in RhizoBindingSites and the RhizoBindingSites v2.0 database (http://rhizobindingsites.ccg.unam.mx/).
RESUMO
Deciphering the genetic architecture of human cardiac disorders is of fundamental importance but their underlying complexity is a major hurdle. We investigated the natural variation of cardiac performance in the sequenced inbred lines of the Drosophila Genetic Reference Panel (DGRP). Genome-wide associations studies (GWAS) identified genetic networks associated with natural variation of cardiac traits which were used to gain insights as to the molecular and cellular processes affected. Non-coding variants that we identified were used to map potential regulatory non-coding regions, which in turn were employed to predict transcription factors (TFs) binding sites. Cognate TFs, many of which themselves bear polymorphisms associated with variations of cardiac performance, were also validated by heart-specific knockdown. Additionally, we showed that the natural variations associated with variability in cardiac performance affect a set of genes overlapping those associated with average traits but through different variants in the same genes. Furthermore, we showed that phenotypic variability was also associated with natural variation of gene regulatory networks. More importantly, we documented correlations between genes associated with cardiac phenotypes in both flies and humans, which supports a conserved genetic architecture regulating adult cardiac function from arthropods to mammals. Specifically, roles for PAX9 and EGR2 in the regulation of the cardiac rhythm were established in both models, illustrating that the characteristics of natural variations in cardiac function identified in Drosophila can accelerate discovery in humans.
Assuntos
Drosophila melanogaster , Coração , Locos de Características Quantitativas , Animais , Humanos , Drosophila melanogaster/fisiologia , Redes Reguladoras de Genes , Variação Genética , Estudo de Associação Genômica Ampla , Fenótipo , Coração/fisiologiaRESUMO
BACKGROUND: Abnormal DNA methylation is observed as an early event in breast carcinogenesis. However, how such alterations arise is still poorly understood. microRNAs (miRNAs) regulate gene expression at the post-transcriptional level and play key roles in various biological processes. Here, we integrate miRNA expression and DNA methylation at CpGs to study how miRNAs may affect the breast cancer methylome and how DNA methylation may regulate miRNA expression. METHODS: miRNA expression and DNA methylation data from two breast cancer cohorts, Oslo2 (n = 297) and The Cancer Genome Atlas (n = 439), were integrated through a correlation approach that we term miRNA-methylation Quantitative Trait Loci (mimQTL) analysis. Hierarchical clustering was used to identify clusters of miRNAs and CpGs that were further characterized through analysis of mRNA/protein expression, clinicopathological features, in silico deconvolution, chromatin state and accessibility, transcription factor binding, and long-range interaction data. RESULTS: Clustering of the significant mimQTLs identified distinct groups of miRNAs and CpGs that reflect important biological processes associated with breast cancer pathogenesis. Notably, two major miRNA clusters were related to immune or fibroblast infiltration, hence identifying miRNAs associated with cells of the tumor microenvironment, while another large cluster was related to estrogen receptor (ER) signaling. Studying the chromatin landscape surrounding CpGs associated with the estrogen signaling cluster, we found that miRNAs from this cluster are likely to be regulated through DNA methylation of enhancers bound by FOXA1, GATA2, and ER-alpha. Further, at the hub of the estrogen cluster, we identified hsa-miR-29c-5p as negatively correlated with the mRNA and protein expression of DNA methyltransferase DNMT3A, a key enzyme regulating DNA methylation. We found deregulation of hsa-miR-29c-5p already present in pre-invasive breast lesions and postulate that hsa-miR-29c-5p may trigger early event abnormal DNA methylation in ER-positive breast cancer. CONCLUSIONS: We describe how miRNA expression and DNA methylation interact and associate with distinct breast cancer phenotypes.
Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Metilação de DNA/genética , Regulação Neoplásica da Expressão Gênica , Hormônios/farmacologia , MicroRNAs/genética , Cromatina/metabolismo , Ilhas de CpG/genética , DNA Metiltransferase 3A/metabolismo , Elementos Facilitadores Genéticos/genética , Feminino , Redes Reguladoras de Genes , Humanos , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Família Multigênica , Fenótipo , Locos de Características Quantitativas/genéticaRESUMO
Gene expression is controlled by the involvement of gene-proximal (promoters) and distal (enhancers) regulatory elements. Our previous results demonstrated that a subset of gene promoters, termed Epromoters, work as bona fide enhancers and regulate distal gene expression. Here, we hypothesized that Epromoters play a key role in the coordination of rapid gene induction during the inflammatory response. Using a high-throughput reporter assay we explored the function of Epromoters in response to type I interferon. We find that clusters of IFNa-induced genes are frequently associated with Epromoters and that these regulatory elements preferentially recruit the STAT1/2 and IRF transcription factors and distally regulate the activation of interferon-response genes. Consistently, we identified and validated the involvement of Epromoter-containing clusters in the regulation of LPS-stimulated macrophages. Our findings suggest that Epromoters function as a local hub recruiting the key TFs required for coordinated regulation of gene clusters during the inflammatory response.
Assuntos
Elementos Facilitadores Genéticos/fisiologia , Inflamação/genética , Fatores Reguladores de Interferon/metabolismo , Regiões Promotoras Genéticas/fisiologia , Animais , Elementos Facilitadores Genéticos/efeitos dos fármacos , Regulação da Expressão Gênica , Células HeLa , Humanos , Inflamação/metabolismo , Interferon Tipo I/metabolismo , Interferon-alfa/farmacologia , Células K562 , Lipopolissacarídeos/farmacologia , Macrófagos/efeitos dos fármacos , Camundongos , Família Multigênica/efeitos dos fármacos , Família Multigênica/genética , Regiões Promotoras Genéticas/efeitos dos fármacos , Fator de Transcrição STAT1/metabolismo , Fator de Transcrição STAT2/metabolismoRESUMO
Gene expression in mammals is precisely regulated by the combination of promoters and gene-distal regulatory regions, known as enhancers. Several studies have suggested that some promoters might have enhancer functions. However, the extent of this type of promoters and whether they actually function to regulate the expression of distal genes have remained elusive. Here, by exploiting a high-throughput enhancer reporter assay, we unravel a set of mammalian promoters displaying enhancer activity. These promoters have distinct genomic and epigenomic features and frequently interact with other gene promoters. Extensive CRISPR-Cas9 genomic manipulation demonstrated the involvement of these promoters in the cis regulation of expression of distal genes in their natural loci. Our results have important implications for the understanding of complex gene regulation in normal development and disease.
Assuntos
Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica/genética , Regiões Promotoras Genéticas/genética , Células 3T3 , Animais , Sistemas CRISPR-Cas , Epigenômica , Ontologia Genética , Células HeLa , Humanos , Interferon-alfa/farmacologia , Células K562 , Mamíferos/genética , CamundongosRESUMO
In this protocol, we explain how to run ab initio motif discovery in order to gather putative transcription factor binding motifs (TFBMs) from sets of genomic regions returned by ChIP-seq experiments. The protocol starts from a set of peak coordinates (genomic regions) which can be either downloaded from ChIP-seq databases, or produced by a peak-calling software tool. We provide a concise description of the successive steps to discover motifs, cluster the motifs returned by different motif discovery algorithms, and compare them with reference motif databases. The protocol is documented with detailed notes explaining the rationale underlying the choice of options. The interpretation of the results is illustrated with an example from the model plant Arabidopsis thaliana.
Assuntos
Imunoprecipitação da Cromatina/métodos , Biologia Computacional/métodos , Genômica/métodos , Software , Algoritmos , Arabidopsis/genética , Sítios de Ligação/genética , Genoma de Planta/genética , Sequenciamento de Nucleotídeos em Larga Escala , Motivos de Nucleotídeos/genética , Elementos Reguladores de TranscriçãoRESUMO
The plant-dedicated mirror of the Regulatory Sequence Analysis Tools (RSAT, http://plants.rsat.eu ) offers specialized options for researchers dealing with plant transcriptional regulation. The website contains whole-sequenced genomes from species regularly updated from Ensembl Plants and other sources (currently 40), and supports an array of tasks frequently required for the analysis of regulatory sequences, such as retrieving upstream sequences, motif discovery, motif comparison, and pattern matching. RSAT::Plants also integrates the footprintDB collection of DNA motifs. This protocol explains step-by-step how to discover DNA motifs in regulatory regions of clusters of co-expressed genes in plants. It also explains how to empirically control the significance of the result, and how to associate the discovered motifs with putative binding factors.