RESUMO
RNA structural transitions are important in the function and regulation of RNAs. Here, we reveal a layer of transcriptome organization in the form of RNA folding energies. By probing yeast RNA structures at different temperatures, we obtained relative melting temperatures (Tm) for RNA structures in over 4000 transcripts. Specific signatures of RNA Tm demarcated the polarity of mRNA open reading frames and highlighted numerous candidate regulatory RNA motifs in 3' untranslated regions. RNA Tm distinguished noncoding versus coding RNAs and identified mRNAs with distinct cellular functions. We identified thousands of putative RNA thermometers, and their presence is predictive of the pattern of RNA decay in vivo during heat shock. The exosome complex recognizes unpaired bases during heat shock to degrade these RNAs, coupling intrinsic structural stabilities to gene regulation. Thus, genome-wide structural dynamics of RNA can parse functional elements of the transcriptome and reveal diverse biological insights.
Assuntos
Metabolismo Energético , Complexo Multienzimático de Ribonucleases do Exossomo/química , RNA , Saccharomyces cerevisiae , Regiões 3' não Traduzidas/genética , Biologia Computacional , Complexo Multienzimático de Ribonucleases do Exossomo/genética , Perfilação da Expressão Gênica , Genoma , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Motivos de Nucleotídeos/genética , RNA/química , RNA/genética , Dobramento de RNA , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , TemperaturaRESUMO
The structures of RNA molecules are often important for their function and regulation, yet there are no experimental techniques for genome-scale measurement of RNA structure. Here we describe a novel strategy termed parallel analysis of RNA structure (PARS), which is based on deep sequencing fragments of RNAs that were treated with structure-specific enzymes, thus providing simultaneous in vitro profiling of the secondary structure of thousands of RNA species at single nucleotide resolution. We apply PARS to profile the secondary structure of the messenger RNAs (mRNAs) of the budding yeast Saccharomyces cerevisiae and obtain structural profiles for over 3,000 distinct transcripts. Analysis of these profiles reveals several RNA structural properties of yeast transcripts, including the existence of more secondary structure over coding regions compared with untranslated regions, a three-nucleotide periodicity of secondary structure across coding regions and an anti-correlation between the efficiency with which an mRNA is translated and the structure over its translation start site. PARS is readily applicable to other organisms and to profiling RNA structure in diverse conditions, thus enabling studies of the dynamics of secondary structure at a genomic scale.
Assuntos
Técnicas Genéticas , Conformação de Ácido Nucleico , RNA Fúngico/química , RNA Mensageiro/química , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética , Sequência de Bases , Estudo de Associação Genômica Ampla , Dados de Sequência Molecular , Transcrição GênicaRESUMO
Abnormalities of genomic methylation patterns are lethal or cause disease, but the cues that normally designate CpG dinucleotides for methylation are poorly understood. We have developed a new method of methylation profiling that has single-CpG resolution and can address the methylation status of repeated sequences. We have used this method to determine the methylation status of >275 million CpG sites in human and mouse DNA from breast and brain tissues. Methylation density at most sequences was found to increase linearly with CpG density and to fall sharply at very high CpG densities, but transposons remained densely methylated even at higher CpG densities. The presence of histone H2A.Z and histone H3 di- or trimethylated at lysine 4 correlated strongly with unmethylated DNA and occurred primarily at promoter regions. We conclude that methylation is the default state of most CpG dinucleotides in the mammalian genome and that a combination of local dinucleotide frequencies, the interaction of repeated sequences, and the presence or absence of histone variants or modifications shields a population of CpG sites (most of which are in and around promoters) from DNA methyltransferases that lack intrinsic sequence specificity.
Assuntos
Sequência de Bases/fisiologia , Cromatina/química , Cromatina/fisiologia , Metilação de DNA , Animais , Encéfalo/metabolismo , Mama/metabolismo , Cromatina/genética , Mapeamento Cromossômico , Ilhas de CpG/genética , Feminino , Genoma , Histonas/metabolismo , Humanos , Camundongos , Análise de Sequência de DNA , Estudos de Validação como AssuntoRESUMO
High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions-particularly those expressed with low abundance- is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample.
Assuntos
Algoritmos , Fusão Gênica/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Dados de Sequência MolecularRESUMO
Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza). More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.