RESUMO
The chloroplast proteome is a dynamic mosaic of plastid- and nuclear-encoded proteins. Plastid protein homeostasis is maintained through the balance between de novo synthesis and proteolysis. Intracellular communication pathways, including the plastid-to-nucleus signalling and the protein homeostasis machinery, made of stromal chaperones and proteases, shape chloroplast proteome based on developmental and physiological needs. However, the maintenance of fully functional chloroplasts is costly and under specific stress conditions the degradation of damaged chloroplasts is essential to the maintenance of a healthy population of photosynthesising organelles while promoting nutrient redistribution to sink tissues. In this work, we have addressed this complex regulatory chloroplast-quality-control pathway by modulating the expression of two nuclear genes encoding plastid ribosomal proteins PRPS1 and PRPL4. By transcriptomics, proteomics and transmission electron microscopy analyses, we show that the increased expression of PRPS1 gene leads to chloroplast degradation and early flowering, as an escape strategy from stress. On the contrary, the overaccumulation of PRPL4 protein is kept under control by increasing the amount of plastid chaperones and components of the unfolded protein response (cpUPR) regulatory mechanism. This study advances our understanding of molecular mechanisms underlying chloroplast retrograde communication and provides new insights into cellular responses to impaired plastid protein homeostasis.
Assuntos
Proteoma , Proteostase , Proteostase/genética , Proteoma/genética , Proteoma/metabolismo , Plastídeos/genética , Plastídeos/metabolismo , Cloroplastos/genética , Cloroplastos/metabolismo , Transdução de Sinais/fisiologia , Proteínas de Cloroplastos/metabolismo , Regulação da Expressão Gênica de PlantasRESUMO
The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization, and message stability. Since 1996, we have developed and maintained UTRdb, a specialized database of UTR sequences. Here we present UTRdb 2.0, a major update of UTRdb featuring an extensive collection of eukaryotic 5' and 3' UTR sequences, including over 26 million entries from over 6 million genes and 573 species, enriched with a curated set of functional annotations. Annotations include CAGE tags and polyA signals to label the completeness of 5' and 3'UTRs, respectively. In addition, uORFs and IRES are annotated in 5'UTRs as well as experimentally validated miRNA targets in 3'UTRs. Further annotations include evolutionarily conserved blocks, Rfam motifs, ADAR-mediated RNA editing events, and m6A modifications. A web interface allowing a flexible selection and retrieval of specific subsets of UTRs, selected according to a combination of criteria, has been implemented which also provides comprehensive download facilities. UTRdb 2.0 is accessible at http://utrdb.cloud.ba.infn.it/utrdb/.
Assuntos
Bases de Dados de Ácidos Nucleicos , Eucariotos , RNA Mensageiro , Regiões não Traduzidas , Regiões 3' não Traduzidas/genética , Regiões 5' não Traduzidas , Eucariotos/genética , Células Eucarióticas/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismoRESUMO
Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding the evolution of infectious agents, investigating the spread and transmission chains of outbreaks, as well as facilitating the development of effective and rapid molecular diagnostic tests and contributing to the hunt for treatments and vaccines. The ongoing COVID-19 pandemic poses one of the greatest global threats in modern history and has already caused severe social and economic costs. The development of efficient and rapid sequencing methods to reconstruct the genomic sequence of SARS-CoV-2, the etiological agent of COVID-19, has been fundamental for the design of diagnostic molecular tests and to devise effective measures and strategies to mitigate the diffusion of the pandemic. Diverse approaches and sequencing methods can, as testified by the number of available sequences, be applied to SARS-CoV-2 genomes. However, each technology and sequencing approach has its own advantages and limitations. In the current review, we will provide a brief, but hopefully comprehensive, account of currently available platforms and methodological approaches for the sequencing of SARS-CoV-2 genomes. We also present an outline of current repositories and databases that provide access to SARS-CoV-2 genomic data and associated metadata. Finally, we offer general advice and guidelines for the appropriate sharing and deposition of SARS-CoV-2 data and metadata, and suggest that more efficient and standardized integration of current and future SARS-CoV-2-related data would greatly facilitate the struggle against this new pathogen. We hope that our 'vademecum' for the production and handling of SARS-CoV-2-related sequencing data, will contribute to this objective.
Assuntos
COVID-19/virologia , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , SARS-CoV-2/genética , COVID-19/epidemiologia , Humanos , PandemiasRESUMO
BACKGROUND: The high-mobility group Hmga family of proteins are non-histone chromatin-interacting proteins which have been associated with a number of nuclear functions, including heterochromatin formation, replication, recombination, DNA repair, transcription, and formation of enhanceosomes. Due to its role based on dynamic interaction with chromatin, Hmga2 has a pathogenic role in diverse tumors and has been mainly studied in a cancer context; however, whether Hmga2 has similar physiological functions in normal cells remains less explored. Hmga2 was additionally shown to be required during the exit of embryonic stem cells (ESCs) from the ground state of pluripotency, to allow their transition into epiblast-like cells (EpiLCs), and here, we use that system to gain further understanding of normal Hmga2 function. RESULTS: We demonstrated that Hmga2 KO pluripotent stem cells fail to develop into EpiLCs. By using this experimental system, we studied the chromatin changes that take place upon the induction of EpiLCs and we observed that the loss of Hmga2 affects the histone mark H3K27me3, whose levels are higher in Hmga2 KO cells. Accordingly, a sustained expression of polycomb repressive complex 2 (PRC2), responsible for H3K27me3 deposition, was observed in KO cells. However, gene expression differences between differentiating wt vs Hmga2 KO cells did not show any significant enrichments of PRC2 targets. Similarly, endogenous Hmga2 association to chromatin in epiblast stem cells did not show any clear relationships with gene expression modification observed in Hmga2 KO. Hmga2 ChIP-seq confirmed that this protein preferentially binds to the chromatin regions associated with nuclear lamina. Starting from this observation, we demonstrated that nuclear lamina underwent severe alterations when Hmga2 KO or KD cells were induced to exit from the naïve state and this phenomenon is accompanied by a mislocalization of the heterochromatin mark H3K9me3 within the nucleus. As nuclear lamina (NL) is involved in the organization of 3D chromatin structure, we explored the possible effects of Hmga2 loss on this phenomenon. The analysis of Hi-C data in wt and Hmga2 KO cells allowed us to observe that inter-TAD (topologically associated domains) interactions in Hmga2 KO cells are different from those observed in wt cells. These differences clearly show a peculiar compartmentalization of inter-TAD interactions in chromatin regions associated or not to nuclear lamina. CONCLUSIONS: Overall, our results indicate that Hmga2 interacts with heterochromatic lamin-associated domains, and highlight a role for Hmga2 in the crosstalk between chromatin and nuclear lamina, affecting the establishment of inter-TAD interactions.
Assuntos
Membrana Nuclear , Células-Tronco Pluripotentes , Cromatina/genética , Cromatina/metabolismo , Proteína HMGA2/genética , Proteína HMGA2/metabolismo , Heterocromatina/metabolismo , Histonas/genética , Membrana Nuclear/metabolismo , Células-Tronco Pluripotentes/metabolismo , Complexo Repressor Polycomb 2/genéticaRESUMO
The Yes-associated protein (YAP), one of the major effectors of the Hippo pathway together with its related protein WW-domain-containing transcription regulator 1 (WWTR1; also known as TAZ), mediates a range of cellular processes from proliferation and death to morphogenesis. YAP and WW-domain-containing transcription regulator 1 (WWTR1; also known as TAZ) regulate a large number of target genes, acting as coactivators of DNA-binding transcription factors or as negative regulators of transcription by interacting with the nucleosome remodeling and histone deacetylase complexes. YAP is expressed in self-renewing embryonic stem cells (ESCs), although it is still debated whether it plays any crucial roles in the control of either stemness or differentiation. Here we show that the transient downregulation of YAP in mouse ESCs perturbs cellular homeostasis, leading to the inability to differentiate properly. Bisulfite genomic sequencing revealed that this transient knockdown caused a genome-wide alteration of the DNA methylation remodeling that takes place during the early steps of differentiation, suggesting that the phenotype we observed might be due to the dysregulation of some of the mechanisms involved in regulation of ESC exit from pluripotency. By gene expression analysis, we identified two molecules that could have a role in the altered genome-wide methylation profile: the long noncoding RNA ephemeron, whose rapid upregulation is crucial for the transition of ESCs into epiblast, and the methyltransferase-like protein Dnmt3l, which, during the embryo development, cooperates with Dnmt3a and Dnmt3b to contribute to the de novo DNA methylation that governs early steps of ESC differentiation. These data suggest a new role for YAP in the governance of the epigenetic dynamics of exit from pluripotency.
Assuntos
Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Diferenciação Celular , DNA (Citosina-5-)-Metiltransferases/metabolismo , Metilação de DNA , Células-Tronco Embrionárias Murinas/citologia , Proteínas Adaptadoras de Transdução de Sinal/genética , Animais , DNA (Citosina-5-)-Metiltransferases/genética , Camundongos , Células-Tronco Embrionárias Murinas/metabolismo , Transdução de Sinais , Proteínas de Sinalização YAP , DNA Metiltransferase 3BRESUMO
A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.
Assuntos
Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Dados de Sequência Molecular , Análise de Sequência de DNA , Alelos , Mapeamento Cromossômico , Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodosRESUMO
SUMMARY: While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. AVAILABILITYAND IMPLEMENTATION: Galaxy.http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
MOTIVATION: Clinical applications of genome re-sequencing technologies typically generate large amounts of data that need to be carefully annotated and interpreted to identify genetic variants potentially associated with pathological conditions. In this context, accurate and reproducible methods for the functional annotation and prioritization of genetic variants are of fundamental importance. RESULTS: In this article, we present VINYL, a flexible and fully automated system for the functional annotation and prioritization of genetic variants. Extensive analyses of both real and simulated datasets suggest that VINYL can identify clinically relevant genetic variants in a more accurate manner compared to equivalent state of the art methods, allowing a more rapid and effective prioritization of genetic variants in different experimental settings. As such we believe that VINYL can establish itself as a valuable tool to assist healthcare operators and researchers in clinical genomics investigations. AVAILABILITY AND IMPLEMENTATION: VINYL is available at http://beaconlab.it/VINYL and https://github.com/matteo14c/VINYL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
BACKGROUND: Improving the availability and usability of data and analytical tools is a critical precondition for further advancing modern biological and biomedical research. For instance, one of the many ramifications of the COVID-19 global pandemic has been to make even more evident the importance of having bioinformatics tools and data readily actionable by researchers through convenient access points and supported by adequate IT infrastructures. One of the most successful efforts in improving the availability and usability of bioinformatics tools and data is represented by the Galaxy workflow manager and its thriving community. In 2020 we introduced Laniakea, a software platform conceived to streamline the configuration and deployment of "on-demand" Galaxy instances over the cloud. By facilitating the set-up and configuration of Galaxy web servers, Laniakea provides researchers with a powerful and highly customisable platform for executing complex bioinformatics analyses. The system can be accessed through a dedicated and user-friendly web interface that allows the Galaxy web server's initial configuration and deployment. RESULTS: "Laniakea@ReCaS", the first instance of a Laniakea-based service, is managed by ELIXIR-IT and was officially launched in February 2020, after about one year of development and testing that involved several users. Researchers can request access to Laniakea@ReCaS through an open-ended call for use-cases. Ten project proposals have been accepted since then, totalling 18 Galaxy on-demand virtual servers that employ ~ 100 CPUs, ~ 250 GB of RAM and ~ 5 TB of storage and serve several different communities and purposes. Herein, we present eight use cases demonstrating the versatility of the platform. CONCLUSIONS: During this first year of activity, the Laniakea-based service emerged as a flexible platform that facilitated the rapid development of bioinformatics tools, the efficient delivery of training activities, and the provision of public bioinformatics services in different settings, including food safety and clinical research. Laniakea@ReCaS provides a proof of concept of how enabling access to appropriate, reliable IT resources and ready-to-use bioinformatics tools can considerably streamline researchers' work.
Assuntos
COVID-19 , Computação em Nuvem , Biologia Computacional , Humanos , SARS-CoV-2 , SoftwareRESUMO
BACKGROUND: The advent of Next Generation Sequencing (NGS) technologies and the concomitant reduction in sequencing costs allows unprecedented high throughput profiling of biological systems in a cost-efficient manner. Modern biological experiments are increasingly becoming both data and computationally intensive and the wealth of publicly available biological data is introducing bioinformatics into the "Big Data" era. For these reasons, the effective application of High Performance Computing (HPC) architectures is becoming progressively more recognized also by bioinformaticians. Here we describe HPC resources provisioning pilot programs dedicated to bioinformaticians, run by the Italian Node of ELIXIR (ELIXIR-IT) in collaboration with CINECA, the main Italian supercomputing center. RESULTS: Starting from April 2016, CINECA and ELIXIR-IT launched the pilot Call "ELIXIR-IT HPC@CINECA", offering streamlined access to HPC resources for bioinformatics. Resources are made available either through web front-ends to dedicated workflows developed at CINECA or by providing direct access to the High Performance Computing systems through a standard command-line interface tailored for bioinformatics data analysis. This allows to offer to the biomedical research community a production scale environment, continuously updated with the latest available versions of publicly available reference datasets and bioinformatic tools. Currently, 63 research projects have gained access to the HPC@CINECA program, for a total handout of ~ 8 Millions of CPU/hours and, for data storage, ~ 100 TB of permanent and ~ 300 TB of temporary space. CONCLUSIONS: Three years after the beginning of the ELIXIR-IT HPC@CINECA program, we can appreciate its impact over the Italian bioinformatics community and draw some considerations. Several Italian researchers who applied to the program have gained access to one of the top-ranking public scientific supercomputing facilities in Europe. Those investigators had the opportunity to sensibly reduce computational turnaround times in their research projects and to process massive amounts of data, pursuing research approaches that would have been otherwise difficult or impossible to undertake. Moreover, by taking advantage of the wealth of documentation and training material provided by CINECA, participants had the opportunity to improve their skills in the usage of HPC systems and be better positioned to apply to similar EU programs of greater scale, such as PRACE. To illustrate the effective usage and impact of the resources awarded by the program - in different research applications - we report five successful use cases, which have already published their findings in peer-reviewed journals.
Assuntos
Biologia Computacional , Metodologias Computacionais , Software , Algoritmos , Animais , Linhagem Celular , Bases de Dados Genéticas , Fusão Gênica , Genoma , Humanos , Prunus persica/genética , Edição de RNA , Andorinhas/genéticaRESUMO
During their lifespan, plants respond to a multitude of stressful factors. Dynamic changes in chromatin and concomitant transcriptional variations control stress response and adaptation, with epigenetic memory mechanisms integrating environmental conditions and appropriate developmental programs over the time. Here we analyzed transcriptome and genome-wide histone modifications of maize plants subjected to a mild and prolonged drought stress just before the flowering transition. Stress was followed by a complete recovery period to evaluate drought memory mechanisms. Three categories of stress-memory genes were identified: i) "transcriptional memory" genes, with stable transcriptional changes persisting after the recovery; ii) "epigenetic memory candidate" genes in which stress-induced chromatin changes persist longer than the stimulus, in absence of transcriptional changes; iii) "delayed memory" genes, not immediately affected by the stress, but perceiving and storing stress signal for a delayed response. This last memory mechanism is described for the first time in drought response. In addition, applied drought stress altered floral patterning, possibly by affecting expression and chromatin of flowering regulatory genes. Altogether, we provided a genome-wide map of the coordination between genes and chromatin marks utilized by plants to adapt to a stressful environment, describing how this serves as a backbone for setting stress memory.
Assuntos
Aclimatação , Adaptação Fisiológica/genética , Epigênese Genética , Flores/fisiologia , Estresse Fisiológico/genética , Zea mays/fisiologia , Cromatina/metabolismo , Mapeamento Cromossômico , Cromossomos de Plantas/fisiologia , Secas , Epigenômica , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Código das Histonas , Histonas/genética , Histonas/metabolismo , Imunoprecipitação , Desenvolvimento Vegetal/genética , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Análise de Componente Principal , Análise de Sequência de RNA , TranscriptomaRESUMO
RNA sequencing (RNA-Seq) has become the experimental standard in transcriptome studies. While most of the bioinformatic pipelines for the analysis of RNA-Seq data and the identification of significant changes in transcript abundance are based on the comparison of two conditions, it is common practice to perform several experiments in parallel (e.g. from different individuals, developmental stages, tissues), for the identification of genes showing a significant variation of expression across all the conditions studied. In this work we present RNentropy, a methodology based on information theory devised for this task, which given expression estimates from any number of RNA-Seq samples and conditions identifies genes or transcripts with a significant variation of expression across all the conditions studied, together with the samples in which they are over- or under-expressed. To show the capabilities offered by our methodology, we applied it to different RNA-Seq datasets: 48 biological replicates of two different yeast conditions; samples extracted from six human tissues of three individuals; seven different mouse brain cell types; human liver samples from six individuals. Results, and their comparison to different state of the art bioinformatic methods, show that RNentropy can provide a quick and in depth analysis of significant changes in gene expression profiles over any number of conditions.
Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência de RNA/estatística & dados numéricos , Software , Animais , Encéfalo/metabolismo , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Genes Fúngicos , Marcadores Genéticos , Humanos , Fígado/metabolismo , Masculino , Camundongos , Mutação , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Análise Espaço-TemporalRESUMO
BACKGROUND: The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. RESULTS: Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CONCLUSIONS: CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .
Assuntos
Biologia Computacional/métodos , Sequência Consenso , Análise de Sequência de DNA/métodos , Software , Algoritmos , Bases de Dados Genéticas , Mutação INDEL , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Sensibilidade e Especificidade , Interface Usuário-Computador , Navegador , Fluxo de TrabalhoRESUMO
MYC deregulation is common in human cancer and has a role in sustaining the aggressive cancer stem cell populations. MYC mediates a broad transcriptional response controlling normal biological programmes, but its activity is not clearly understood. We address MYC function in cancer stem cells through the inducible expression of Omomyc-a MYC-derived polypeptide interfering with MYC activity-taking as model the most lethal brain tumour, glioblastoma. Omomyc bridles the key cancer stemlike cell features and affects the tumour microenvironment, inhibiting angiogenesis. This occurs because Omomyc interferes with proper MYC localization and itself associates with the genome, with a preference for sites occupied by MYC This is accompanied by selective repression of master transcription factors for glioblastoma stemlike cell identity such as OLIG2, POU3F2, SOX2, upregulation of effectors of tumour suppression and differentiation such as ID4, MIAT, PTEN, and modulation of the expression of microRNAs that target molecules implicated in glioblastoma growth and invasion such as EGFR and ZEB1. Data support a novel view of MYC as a network stabilizer that strengthens the regulatory nodes of gene expression networks controlling cell phenotype and highlight Omomyc as model molecule for targeting cancer stem cells.
Assuntos
Regulação Neoplásica da Expressão Gênica , Genes myc , Glioblastoma/genética , Células-Tronco Neoplásicas/fisiologia , Fragmentos de Peptídeos/genética , Proteínas Proto-Oncogênicas c-myc/genética , Fatores de Transcrição/genética , Inibidores da Angiogênese , Apoptose , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Diferenciação Celular , Proliferação de Células , Receptores ErbB/genética , Glioblastoma/fisiopatologia , Humanos , Proteínas Inibidoras de Diferenciação/genética , MicroRNAs/genética , Proteínas do Tecido Nervoso/genética , Fator de Transcrição 2 de Oligodendrócitos , Ligação Proteica , Ativação Transcricional , Microambiente Tumoral/genética , Homeobox 1 de Ligação a E-box em Dedo de Zinco/genéticaRESUMO
NF-Y is a trimeric transcription factor (TF), binding the CCAAT box element, for which several results suggest a pioneering role in activation of transcription. In this work, we integrated 380 ENCODE ChIP-Seq experiments for 154 TFs and cofactors with sequence analysis, protein-protein interactions and RNA profiling data, in order to identify genome-wide regulatory modules resulting from the co-association of NF-Y with other TFs. We identified three main degrees of co-association with NF-Y for sequence-specific TFs. In the most relevant one, we found TFs having a significant overlap with NF-Y in their DNA binding loci, some with a precise spacing of binding sites with respect to the CCAAT box, others (FOS, Sp1/2, RFX5, IRF3, PBX3) mostly lacking their canonical binding site and bound to arrays of well spaced CCAAT boxes. As expected, NF-Y binding also correlates with RNA Pol II General TFs and with subunits of complexes involved in the control of H3K4 methylations. Co-association patterns are confirmed by protein-protein interactions, and correspond to specific functional categorizations and expression level changes of target genes following NF-Y inactivation. These data define genome-wide rules for the organization of NF-Y-centered regulatory modules, supporting a model of distinct categorization and synergy with well defined sets of TFs.
Assuntos
Fator de Ligação a CCAAT/metabolismo , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Sítios de Ligação , Linhagem Celular , Imunoprecipitação da Cromatina , DNA/química , DNA/metabolismo , Perfilação da Expressão Gênica , Genoma , Humanos , Mapeamento de Interação de Proteínas , Análise de Sequência de DNARESUMO
Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental techniques like chromatin immunoprecipitation (ChIP) have been introduced, permitting the genome-wide identification of protein-DNA interactions. ChIP, applied to transcription factors and coupled with genome tiling arrays (ChIP on Chip) or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Algoritmos , Animais , Sítios de Ligação/genética , Imunoprecipitação da Cromatina/estatística & dados numéricos , Biologia Computacional , Sequência Consenso , DNA/genética , DNA/metabolismo , Perfilação da Expressão Gênica/estatística & dados numéricos , HumanosRESUMO
Chromatin immunoprecipitation followed by sequencing with next-generation technologies (ChIP-Seq) has become the de facto standard for building genome-wide maps of regions bound by a given transcription factor (TF). The regions identified, however, have to be further analyzed to determine the actual DNA-binding sites for the TF, as well as sites for other TFs belonging to the same TF complex or in general co-operating or interacting with it in transcription regulation. PscanChIP is a web server that, starting from a collection of genomic regions derived from a ChIP-Seq experiment, scans them using motif descriptors like JASPAR or TRANSFAC position-specific frequency matrices, or descriptors uploaded by users, and it evaluates both motif enrichment and positional bias within the regions according to different measures and criteria. PscanChIP can successfully identify not only the actual binding sites for the TF investigated by a ChIP-Seq experiment but also secondary motifs corresponding to other TFs that tend to bind the same regions, and, if present, precise positional correlations among their respective sites. The web interface is free for use, and there is no login requirement. It is available at http://www.beaconlab.it/pscan_chip_dev.
Assuntos
Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Fator de Ligação a CCAAT/metabolismo , Linhagem Celular , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Células-Tronco Embrionárias/metabolismo , Humanos , Internet , Células K562 , Camundongos , Motivos de Nucleotídeos , Fator de Transcrição STAT3/metabolismoRESUMO
A comprehensive knowledge of all the factors involved in splicing, both proteins and RNAs, and of their interaction network is crucial for reaching a better understanding of this process and its functions. A large part of relevant information is buried in the literature or collected in various different databases. By hand-curated screenings of literature and databases, we retrieved experimentally validated data on 71 human RNA-binding splicing regulatory proteins and organized them into a database called 'SpliceAid-F' (http://www.caspur.it/SpliceAidF/). For each splicing factor (SF), the database reports its functional domains, its protein and chemical interactors and its expression data. Furthermore, we collected experimentally validated RNA-SF interactions, including relevant information on the RNA-binding sites, such as the genes where these sites lie, their genomic coordinates, the splicing effects, the experimental procedures used, as well as the corresponding bibliographic references. We also collected information from experiments showing no RNA-SF binding, at least in the assayed conditions. In total, SpliceAid-F contains 4227 interactions, 2590 RNA-binding sites and 1141 'no-binding' sites, including information on cellular contexts and conditions where binding was tested. The data collected in SpliceAid-F can provide significant information to explain an observed splicing pattern as well as the effect of mutations in functional regulatory elements.
Assuntos
Bases de Dados de Proteínas , Splicing de RNA , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/metabolismo , Sítios de Ligação , Humanos , Internet , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Precursores de RNA/metabolismo , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Interface Usuário-ComputadorRESUMO
Upstream of N-ras (UNR) is a conserved RNA-binding protein that regulates mRNA translation and stability by binding to sites generally located in untranslated regions (UTRs). In Drosophila, sex-specific binding of UNR to msl2 mRNA and the noncoding RNA roX is believed to play key roles in the control of X-chromosome dosage compensation in both sexes. To investigate broader sex-specific functions of UNR, we have identified its RNA targets in adult male and female flies by high-throughput RNA binding and transcriptome analysis. Here we show that UNR binds to a large set of protein-coding transcripts and to a smaller set of noncoding RNAs in a sex-specific fashion. The analyses also reveal a strong correlation between sex-specific binding of UNR and sex-specific differential expression of UTRs in target genes. Validation experiments indicate that UNR indeed recognizes sex-specifically processed transcripts. These results suggest that UNR exploits the transcript diversity generated by alternative processing and alternative promoter usage to bind and regulate target genes in a sex-specific manner.
Assuntos
Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Proteínas Nucleares/genética , RNA Mensageiro/metabolismo , Fatores de Transcrição/genética , Regiões não Traduzidas , Animais , Drosophila melanogaster/genética , Feminino , Masculino , Regiões Promotoras Genéticas , RNA Mensageiro/genética , Fatores Sexuais , Transcrição GênicaRESUMO
The regulation of transcription of eukaryotic genes is a very complex process, which involves interactions between transcription factors (TFs) and DNA, as well as other epigenetic factors like histone modifications, DNA methylation, and so on, which nowadays can be studied and characterized with techniques like ChIP-Seq. Cscan is a web resource that includes a large collection of genome-wide ChIP-Seq experiments performed on TFs, histone modifications, RNA polymerases and others. Enriched peak regions from the ChIP-Seq experiments are crossed with the genomic coordinates of a set of input genes, to identify which of the experiments present a statistically significant number of peaks within the input genes' loci. The input can be a cluster of co-expressed genes, or any other set of genes sharing a common regulatory profile. Users can thus single out which TFs are likely to be common regulators of the genes, and their respective correlations. Also, by examining results on promoter activation, transcription, histone modifications, polymerase binding and so on, users can investigate the effect of the TFs (activation or repression of transcription) as well as of the cell or tissue specificity of the genes' regulation and expression. The web interface is free for use, and there is no login requirement. Available at: http://www.beaconlab.it/cscan.