RESUMEN
Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only â¼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for â¼34% of the â¼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.
Asunto(s)
Arabidopsis/genética , Motivos de Nucleótidos , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo , Arabidopsis/metabolismo , Inmunoprecipitación de Cromatina , Humanos , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas , Unión Proteica , Sitios de Carácter CuantitativoRESUMEN
The translational control of oncoprotein expression is implicated in many cancers. Here we report an eIF4A RNA helicase-dependent mechanism of translational control that contributes to oncogenesis and underlies the anticancer effects of silvestrol and related compounds. For example, eIF4A promotes T-cell acute lymphoblastic leukaemia development in vivo and is required for leukaemia maintenance. Accordingly, inhibition of eIF4A with silvestrol has powerful therapeutic effects against murine and human leukaemic cells in vitro and in vivo. We use transcriptome-scale ribosome footprinting to identify the hallmarks of eIF4A-dependent transcripts. These include 5' untranslated region (UTR) sequences such as the 12-nucleotide guanine quartet (CGG)4 motif that can form RNA G-quadruplex structures. Notably, among the most eIF4A-dependent and silvestrol-sensitive transcripts are a number of oncogenes, superenhancer-associated transcription factors, and epigenetic regulators. Hence, the 5' UTRs of select cancer genes harbour a targetable requirement for the eIF4A RNA helicase.
Asunto(s)
Regiones no Traducidas 5'/genética , Factor 4A Eucariótico de Iniciación/metabolismo , G-Cuádruplex , Proteínas Oncogénicas/biosíntesis , Proteínas Oncogénicas/genética , Leucemia-Linfoma Linfoblástico de Células T Precursoras/metabolismo , Biosíntesis de Proteínas , Animales , Antineoplásicos Fitogénicos/farmacología , Antineoplásicos Fitogénicos/uso terapéutico , Secuencia de Bases , Línea Celular Tumoral , Epigénesis Genética , Femenino , Humanos , Ratones , Ratones Endogámicos C57BL , Motivos de Nucleótidos , Leucemia-Linfoma Linfoblástico de Células T Precursoras/tratamiento farmacológico , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Biosíntesis de Proteínas/efectos de los fármacos , Ribosomas/metabolismo , Factores de Transcripción/metabolismo , Transcripción Genética/efectos de los fármacos , Transcripción Genética/genética , Triterpenos/farmacologíaRESUMEN
Plants use light as source of energy and information to detect diurnal rhythms and seasonal changes. Sensing changing light conditions is critical to adjust plant metabolism and to initiate developmental transitions. Here, we analyzed transcriptome-wide alterations in gene expression and alternative splicing (AS) of etiolated seedlings undergoing photomorphogenesis upon exposure to blue, red, or white light. Our analysis revealed massive transcriptome reprogramming as reflected by differential expression of â¼20% of all genes and changes in several hundred AS events. For more than 60% of all regulated AS events, light promoted the production of a presumably protein-coding variant at the expense of an mRNA with nonsense-mediated decay-triggering features. Accordingly, AS of the putative splicing factor REDUCED RED-LIGHT RESPONSES IN CRY1CRY2 BACKGROUND1, previously identified as a red light signaling component, was shifted to the functional variant under light. Downstream analyses of candidate AS events pointed at a role of photoreceptor signaling only in monochromatic but not in white light. Furthermore, we demonstrated similar AS changes upon light exposure and exogenous sugar supply, with a critical involvement of kinase signaling. We propose that AS is an integration point of signaling pathways that sense and transmit information regarding the energy availability in plants.
Asunto(s)
Empalme Alternativo/fisiología , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Transcriptoma/genética , Empalme Alternativo/genética , Arabidopsis/fisiología , Proteínas de Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas/genética , Regulación de la Expresión Génica de las Plantas/fisiología , Transducción de Señal/genética , Transducción de Señal/fisiologíaRESUMEN
MOTIVATION: Deep sequencing based ribosome footprint profiling can provide novel insights into the regulatory mechanisms of protein translation. However, the observed ribosome profile is fundamentally confounded by transcriptional activity. In order to decipher principles of translation regulation, tools that can reliably detect changes in translation efficiency in case-control studies are needed. RESULTS: We present a statistical framework and an analysis tool, RiboDiff, to detect genes with changes in translation efficiency across experimental treatments. RiboDiff uses generalized linear models to estimate the over-dispersion of RNA-Seq and ribosome profiling measurements separately, and performs a statistical test for differential translation efficiency using both mRNA abundance and ribosome occupancy. AVAILABILITY AND IMPLEMENTATION: RiboDiff webpage http://bioweb.me/ribodiff Source code including scripts for preprocessing the FASTQ data are available at http://github.com/ratschlab/ribodiff CONTACTS: zhongy@cbio.mskcc.org or raetsch@inf.ethz.chSupplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biosíntesis de Proteínas , ARN Mensajero/metabolismo , Ribosomas/metabolismo , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.
Asunto(s)
Arabidopsis/genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas/genética , Genoma de Planta/genética , Transcripción Genética/genética , Arabidopsis/clasificación , Proteínas de Arabidopsis/genética , Secuencia de Bases , Genes de Plantas/genética , Genómica , Haplotipos/genética , Mutación INDEL/genética , Anotación de Secuencia Molecular , Filogenia , Polimorfismo de Nucleótido Simple/genética , Proteoma/genética , Plantones/genética , Análisis de Secuencia de ADNRESUMEN
The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNAs, such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS and NMD remains scarce and even absent for any plant species. To address this, we conducted transcriptome-wide splicing studies using Arabidopsis thaliana mutants in the NMD factor homologs UP FRAMESHIFT1 (UPF1) and UPF3 as well as wild-type samples treated with the translation inhibitor cycloheximide. Our analyses revealed that at least 17.4% of all multi-exon, protein-coding genes produce splicing variants that are targeted by NMD. Moreover, we provide evidence that UPF1 and UPF3 act in a translation-independent mRNA decay pathway. Importantly, 92.3% of the NMD-responsive mRNAs exhibit classical NMD-eliciting features, supporting their authenticity as direct targets. Genes generating NMD-sensitive AS variants function in diverse biological processes, including signaling and protein modification, for which NaCl stress-modulated AS-NMD was found. Besides mRNAs, numerous noncoding RNAs and transcripts derived from intergenic regions were shown to be NMD responsive. In summary, we provide evidence for a major function of AS-coupled NMD in shaping the Arabidopsis transcriptome, having fundamental implications in gene regulation and quality control of transcript processing.
Asunto(s)
Empalme Alternativo , Arabidopsis/genética , Degradación de ARNm Mediada por Codón sin Sentido , Transcriptoma , Proteínas de Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas , Genotipo , Mutación , ARN Helicasas/genética , ARN de Planta/genética , Análisis de Secuencia de ARNRESUMEN
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Asunto(s)
ARN/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma , Secuencia de Bases , Internet , Programas InformáticosRESUMEN
Deep transcriptome sequencing (RNA-Seq) has become a vital tool for studying the state of cells in the context of varying environments, genotypes and other factors. RNA-Seq profiling data enable identification of novel isoforms, quantification of known isoforms and detection of changes in transcriptional or RNA-processing activity. Existing approaches to detect differential isoform abundance between samples either require a complete isoform annotation or fall short in providing statistically robust and calibrated significance estimates. Here, we propose a suite of statistical tests to address these open needs: a parametric test that uses known isoform annotations to detect changes in relative isoform abundance and a non-parametric test that detects differential read coverages and can be applied when isoform annotations are not available. Both methods account for the discrete nature of read counts and the inherent biological variability. We demonstrate that these tests compare favorably to previous methods, both in terms of accuracy and statistical calibrations. We use these techniques to analyze RNA-Seq libraries from Arabidopsis thaliana and Drosophila melanogaster. The identified differential RNA processing events were consistent with RT-qPCR measurements and previous studies. The proposed toolkit is available from http://bioweb.me/rdiff and enables in-depth analyses of transcriptomes, with or without available isoform annotation.
Asunto(s)
Procesamiento Postranscripcional del ARN , Algoritmos , Animales , Arabidopsis/genética , Arabidopsis/metabolismo , Interpretación Estadística de Datos , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Perfilación de la Expresión Génica , Anotación de Secuencia Molecular , Isoformas de ARN/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa InversaRESUMEN
MOTIVATION: High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. RESULTS: We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. AVAILABILITY: MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN/análisis , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Transcripción Genética , Animales , Drosophila melanogaster , Humanos , Internet , ARN/genéticaRESUMEN
Deep sequencing of transcriptomes allows quantitative and qualitative analysis of many RNA species in a sample, with parallel comparison of expression levels, splicing variants, natural antisense transcripts, RNA editing and transcriptional start and stop sites the ideal goal. By computational modeling, we show how libraries of multiple insert sizes combined with strand-specific, paired-end (SS-PE) sequencing can increase the information gained on alternative splicing, especially in higher eukaryotes. Despite the benefits of gaining SS-PE data with paired ends of varying distance, the standard Illumina protocol allows only non-strand-specific, paired-end sequencing with a single insert size. Here, we modify the Illumina RNA ligation protocol to allow SS-PE sequencing by using a custom pre-adenylated 3' adaptor. We generate parallel libraries with differing insert sizes to aid deconvolution of alternative splicing events and to characterize the extent and distribution of natural antisense transcription in C. elegans. Despite stringent requirements for detection of alternative splicing, our data increases the number of intron retention and exon skipping events annotated in the Wormbase genome annotations by 127% and 121%, respectively. We show that parallel libraries with a range of insert sizes increase transcriptomic information gained by sequencing and that by current established benchmarks our protocol gives competitive results with respect to library quality.
Asunto(s)
Caenorhabditis elegans/genética , Perfilación de la Expresión Génica/métodos , Transcriptoma , Empalme Alternativo , Animales , Proteínas de Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Bases de Datos Genéticas , Biblioteca de Genes , Genes de Helminto , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Análisis de Secuencia de ARN , Transcripción GenéticaRESUMEN
Early stages of embryogenesis depend on subcellular localization and transport of maternal mRNA. However, systematic analysis of these processes is hindered by a lack of spatio-temporal information in single-cell RNA sequencing. Here, we combine spatially-resolved transcriptomics and single-cell RNA labeling to perform a spatio-temporal analysis of the transcriptome during early zebrafish development. We measure spatial localization of mRNA molecules within the one-cell stage embryo, which allows us to identify a class of mRNAs that are specifically localized at an extraembryonic position, the vegetal pole. Furthermore, we establish a method for high-throughput single-cell RNA labeling in early zebrafish embryos, which enables us to follow the fate of individual maternal transcripts until gastrulation. This approach reveals that many localized transcripts are specifically transported to the primordial germ cells. Finally, we acquire spatial transcriptomes of two xenopus species and compare evolutionary conservation of localized genes as well as enriched sequence motifs.
Asunto(s)
Rastreo Celular/métodos , Embrión no Mamífero/metabolismo , ARN Mensajero/genética , Transcriptoma/genética , Pez Cebra/genética , Animales , Embrión no Mamífero/citología , Embrión no Mamífero/embriología , Femenino , Regulación del Desarrollo de la Expresión Génica , Oocitos/citología , Oocitos/metabolismo , ARN Mensajero/metabolismo , Análisis de la Célula Individual/métodos , Análisis Espacio-Temporal , Especificidad de la Especie , Xenopus/embriología , Xenopus/genética , Xenopus laevis/embriología , Xenopus laevis/genética , Pez Cebra/embriologíaRESUMEN
CLIP-Seq protocols such as PAR-CLIP, HITS-CLIP or iCLIP allow a genome-wide analysis of protein-RNA interactions. For the processing of the resulting short read data, various tools are utilized. Some of these tools were specifically developed for CLIP-Seq data, whereas others were designed for the analysis of RNA-Seq data. To this date, however, it has not been assessed which of the available tools are most appropriate for the analysis of CLIP-Seq data. This is because an experimental gold standard dataset on which methods can be accessed and compared, is still not available. To address this lack of a gold-standard dataset, we here present Cseq-Simulator, a simulator for PAR-CLIP, HITS-CLIP and iCLIP-data. This simulator can be applied to generate realistic datasets that can serve as surrogates for experimental gold standard dataset. In this work, we also show how Cseq-Simulator can be used to perform a comparison of steps of typical CLIP-Seq analysis pipelines, such as the read alignment or the peak calling. These comparisons show which tools are useful in different settings and also allow identifying pitfalls in the data analysis.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Análisis de Secuencia de ARN/estadística & datos numéricos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Simulación por Computador , Reactivos de Enlaces Cruzados , Genoma Humano , Humanos , ARN/genética , ARN/metabolismo , Procesamiento Postranscripcional del ARN , Proteínas de Unión al ARN/metabolismo , Alineación de Secuencia/estadística & datos numéricosRESUMEN
Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent to which this occurs, nor the mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association studies (GWAS) revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) was not affected by growth temperature, but was instead correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected. GWAS revealed that this effect was largely due to trans-acting loci, many of which showed evidence of local adaptation.
Asunto(s)
Adaptación Fisiológica/genética , Proteínas de Arabidopsis/genética , Arabidopsis/genética , ADN (Citosina-5-)-Metiltransferasas/genética , Regulación de la Expresión Génica de las Plantas , Genoma de Planta , Arabidopsis/metabolismo , Proteínas de Arabidopsis/metabolismo , Islas de CpG , ADN (Citosina-5-)-Metiltransferasas/metabolismo , Metilación de ADN , Elementos Transponibles de ADN , Epigénesis Genética , Perfilación de la Expresión Génica , Variación Genética , Estudio de Asociación del Genoma Completo , Temperatura , Transcripción GenéticaRESUMEN
Analysis of microscopy images can provide insight into many biological processes. One particularly challenging problem is cellular nuclear segmentation in highly anisotropic and noisy 3D image data. Manually localizing and segmenting each and every cellular nucleus is very time-consuming, which remains a bottleneck in large-scale biological experiments. In this work, we present a tool for automated segmentation of cellular nuclei from 3D fluorescent microscopic data. Our tool is based on state-of-the-art image processing and machine learning techniques and provides a user-friendly graphical user interface. We show that our tool is as accurate as manual annotation and greatly reduces the time for the registration.
RESUMEN
The spindle assembly checkpoint is a conserved signalling pathway that protects genome integrity. Given its central importance, this checkpoint should withstand stochastic fluctuations and environmental perturbations, but the extent of and mechanisms underlying its robustness remain unknown. We probed spindle assembly checkpoint signalling by modulating checkpoint protein abundance and nutrient conditions in fission yeast. For core checkpoint proteins, a mere 20% reduction can suffice to impair signalling, revealing a surprising fragility. Quantification of protein abundance in single cells showed little variability (noise) of critical proteins, explaining why the checkpoint normally functions reliably. Checkpoint-mediated stoichiometric inhibition of the anaphase activator Cdc20 (Slp1 in Schizosaccharomyces pombe) can account for the tolerance towards small fluctuations in protein abundance and explains our observation that some perturbations lead to non-genetic variation in the checkpoint response. Our work highlights low gene expression noise as an important determinant of reliable checkpoint signalling.