RESUMO
MOTIVATION: In living organisms, many RNA molecules are modified post-transcriptionally. This turns the widely known four-letter RNA alphabet ACGU into a much larger one with currently more than 300 known distinct modified bases. The roles for the majority of modified bases remain uncertain, but many are already well-known for their ability to influence the preferred structures that an RNA may adopt. In fact, tRNAs sometimes require certain modifications to fold into their cloverleaf shaped structure. However, predicting the structure of RNAs with base modifications is still difficult due to the lack of efficient algorithms that can deal with the extended sequence alphabet, as well as missing parameter sets that account for the changes in stability induced by the modified bases. RESULTS: We present an approach to include sparse energy parameter data for modified bases into the ViennaRNA Package. Our method does not require any changes to the underlying efficient algorithms but instead uses a set of plug-in constraints that adapt the predictions in terms of loop evaluation at runtime. These adaptations are efficient in the sense that they are only performed for loops where additional parameters are actually available for. In addition, our approach also facilitates the inclusion of more modified bases as soon as further parameters become available. AVAILABILITY AND IMPLEMENTATION: Source code and documentation are available at https://www.tbi.univie.ac.at/RNA.
Assuntos
RNA , Software , Conformação de Ácido Nucleico , RNA/química , Algoritmos , Dobramento de RNARESUMO
MOTIVATION: Folding during transcription can have an important influence on the structure and function of RNA molecules, as regions closer to the 5' end can fold into metastable structures before potentially stronger interactions with the 3' end become available. Thermodynamic RNA folding models are not suitable to predict structures that result from cotranscriptional folding, as they can only calculate properties of the equilibrium distribution. Other software packages that simulate the kinetic process of RNA folding during transcription exist, but they are mostly applicable for short sequences. RESULTS: We present a new algorithm that tracks changes to the RNA secondary structure ensemble during transcription. At every transcription step, new representative local minima are identified, a neighborhood relation is defined and transition rates are estimated for kinetic simulations. After every simulation, a part of the ensemble is removed and the remainder is used to search for new representative structures. The presented algorithm is deterministic (up to numeric instabilities of simulations), fast (in comparison with existing methods), and it is capable of folding RNAs much longer than 200 nucleotides. AVAILABILITY AND IMPLEMENTATION: This software is open-source and available at https://github.com/ViennaRNA/drtransformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Heurística , Dobramento de RNA , Conformação de Ácido Nucleico , RNA/química , Software , AlgoritmosRESUMO
MOTIVATION: Predicting the folding dynamics of RNAs is a computationally difficult problem, first and foremost due to the combinatorial explosion of alternative structures in the folding space. Abstractions are therefore needed to simplify downstream analyses, and thus make them computationally tractable. This can be achieved by various structure sampling algorithms. However, current sampling methods are still time consuming and frequently fail to represent key elements of the folding space. METHOD: We introduce RNAxplorer, a novel adaptive sampling method to efficiently explore the structure space of RNAs. RNAxplorer uses dynamic programming to perform an efficient Boltzmann sampling in the presence of guiding potentials, which are accumulated into pseudo-energy terms and reflect similarity to already well-sampled structures. This way, we effectively steer sampling toward underrepresented or unexplored regions of the structure space. RESULTS: We developed and applied different measures to benchmark our sampling methods against its competitors. Most of the measures show that RNAxplorer produces more diverse structure samples, yields rare conformations that may be inaccessible to other sampling methods and is better at finding the most relevant kinetic traps in the landscape. Thus, it produces a more representative coarse graining of the landscape, which is well suited to subsequently compute better approximations of RNA folding kinetics. AVAILABILITYAND IMPLEMENTATION: https://github.com/ViennaRNA/RNAxplorer/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Chemical modifications of RNA nucleotides change their identity and characteristics and thus alter genetic and structural information encoded in the genomic DNA. tRNA and rRNA are probably the most heavily modified genes, and often depend on derivatization or isomerization of their nucleobases in order to correctly fold into their functional structures. Recent RNomics studies, however, report transcriptome wide RNA modification and suggest a more general regulation of structuredness of RNAs by this so called epitranscriptome. Modification seems to require specific substrate structures, which in turn are stabilized or destabilized and thus promote or inhibit refolding events of regulatory RNA structures. In this review, we revisit RNA modifications and the related structures from a computational point of view. We discuss known substrate structures, their properties such as sub-motifs as well as consequences of modifications on base pairing patterns and possible refolding events. Given that efficient RNA structure prediction methods for canonical base pairs have been established several decades ago, we review to what extend these methods allow the inclusion of modified nucleotides to model and study epitranscriptomic effects on RNA structures.
Assuntos
Adenosina/metabolismo , Inosina/metabolismo , Processamento Pós-Transcricional do RNA , Análise de Sequência de RNA/métodos , Transcriptoma , Animais , Pareamento de Bases , Sequência de Bases , Humanos , Metilação , MicroRNAs/genética , MicroRNAs/metabolismo , Conformação de Ácido Nucleico , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA Ribossômico/genética , RNA Ribossômico/metabolismo , RNA Nuclear Pequeno/genética , RNA Nuclear Pequeno/metabolismo , RNA de Transferência/genética , RNA de Transferência/metabolismoRESUMO
SUMMARY: Chemical mapping experiments allow for nucleotide resolution assessment of RNA structure. We demonstrate that different strategies of integrating probing data with thermodynamics-based RNA secondary structure prediction algorithms can be implemented by means of soft constraints. This amounts to incorporating suitable pseudo-energies into the standard energy model for RNA secondary structures. As a showcase application for this new feature of the ViennaRNA Package we compare three distinct, previously published strategies to utilize SHAPE reactivities for structure prediction. The new tool is benchmarked on a set of RNAs with known reference structure. AVAILABILITY AND IMPLEMENTATION: The capability for SHAPE directed RNA folding is part of the upcoming release of the ViennaRNA Package 2.2, for which a preliminary release is already freely available at http://www.tbi.univie.ac.at/RNA. CONTACT: michael.wolfinger@univie.ac.at SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Conformação de Ácido Nucleico , Dobramento de RNA , Sequência de Bases , Escherichia coli/genética , Dados de Sequência Molecular , RNA Ribossômico/genética , TermodinâmicaRESUMO
RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information.
Assuntos
RNA/química , Algoritmos , Sequência de Bases , Biologia Computacional , Simulação por Computador , Humanos , Modelos Moleculares , Dobramento de RNA , Análise de Sequência de RNARESUMO
Discovery and characterization of functional RNA structures remains challenging due to deficiencies in de novo secondary structure modeling. Here we describe a dynamic programming approach for model-free sequence comparison that incorporates high-throughput chemical probing data. Based on SHAPE probing data alone, ribosomal RNAs (rRNAs) from three diverse organisms--the eubacteria E. coli and C. difficile and the archeon H. volcanii--could be aligned with accuracies comparable to alignments based on actual sequence identity. When both base sequence identity and chemical probing reactivities were considered together, accuracies improved further. Derived sequence alignments and chemical probing data from protein-free RNAs were then used as pseudo-free energy constraints to model consensus secondary structures for the 16S and 23S rRNAs. There are critical differences between these experimentally-informed models and currently accepted models, including in the functionally important neck and decoding regions of the 16S rRNA. We infer that the 16S rRNA has evolved to undergo large-scale changes in base pairing as part of ribosome function. As high-quality RNA probing data become widely available, structurally-informed sequence alignment will become broadly useful for de novo motif and function discovery.
Assuntos
Conformação de Ácido Nucleico , RNA Ribossômico 16S/química , RNA Ribossômico 16S/genética , Alinhamento de Sequência/estatística & dados numéricos , Sequência de Bases , Clostridioides difficile/genética , Biologia Computacional , Escherichia coli/genética , Haloferax volcanii/genética , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Modelos Moleculares , Dados de Sequência Molecular , RNA Arqueal/química , RNA Arqueal/genética , RNA Bacteriano/química , RNA Bacteriano/genética , Análise de Sequência de RNA/estatística & dados numéricosRESUMO
Riboswitches are RNA-based regulators of gene expression composed of a ligand-sensing aptamer domain followed by an overlapping expression platform. The regulation occurs at either the level of transcription (by formation of terminator or antiterminator structures) or translation (by presentation or sequestering of the ribosomal binding site). Due to a modular composition, these elements can be manipulated by combining different aptamers and expression platforms and therefore represent useful tools to regulate gene expression in synthetic biology. Using computationally designed theophylline-dependent riboswitches we show that 2 parameters, terminator hairpin stability and folding traps, have a major impact on the functionality of the designed constructs. These have to be considered very carefully during design phase. Furthermore, a combination of several copies of individual riboswitches leads to a much improved activation ratio between induced and uninduced gene activity and to a linear dose-dependent increase in reporter gene expression. Such serial arrangements of synthetic riboswitches closely resemble their natural counterparts and may form the basis for simple quantitative read out systems for the detection of specific target molecules in the cell.
Assuntos
Desenho de Fármacos , Riboswitch , Transcrição Gênica , Escherichia coli/genética , Escherichia coli/metabolismo , Genes Reporter , Proteínas de Fluorescência Verde/genética , Proteínas de Fluorescência Verde/metabolismo , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Relação Estrutura-Atividade , Biologia Sintética , Teofilina/química , Termodinâmica , beta-Galactosidase/genética , beta-Galactosidase/metabolismoRESUMO
BACKGROUND: Differential RNA sequencing (dRNA-seq) is a high-throughput screening technique designed to examine the architecture of bacterial operons in general and the precise position of transcription start sites (TSS) in particular. Hitherto, dRNA-seq data were analyzed by visualizing the sequencing reads mapped to the reference genome and manually annotating reliable positions. This is very labor intensive and, due to the subjectivity, biased. RESULTS: Here, we present TSSAR, a tool for automated de novo TSS annotation from dRNA-seq data that respects the statistics of dRNA-seq libraries. TSSAR uses the premise that the number of sequencing reads starting at a certain genomic position within a transcriptional active region follows a Poisson distribution with a parameter that depends on the local strength of expression. The differences of two dRNA-seq library counts thus follow a Skellam distribution. This provides a statistical basis to identify significantly enriched primary transcripts.We assessed the performance by analyzing a publicly available dRNA-seq data set using TSSAR and two simple approaches that utilize user-defined score cutoffs. We evaluated the power of reproducing the manual TSS annotation. Furthermore, the same data set was used to reproduce 74 experimentally validated TSS in H. pylori from reliable techniques such as RACE or primer extension. Both analyses showed that TSSAR outperforms the static cutoff-dependent approaches. CONCLUSIONS: Having an automated and efficient tool for analyzing dRNA-seq data facilitates the use of the dRNA-seq technique and promotes its application to more sophisticated analysis. For instance, monitoring the plasticity and dynamics of the transcriptomal architecture triggered by different stimuli and growth conditions becomes possible.The main asset of a novel tool for dRNA-seq analysis that reaches out to a broad user community is usability. As such, we provide TSSAR both as intuitive RESTful Web service ( http://rna.tbi.univie.ac.at/TSSAR) together with a set of post-processing and analysis tools, as well as a stand-alone version for use in high-throughput dRNA-seq data analysis pipelines.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Genoma , Genômica/métodos , Helicobacter pylori/genética , Humanos , Software , Stenotrophomonas maltophilia/genéticaRESUMO
Several different ways to predict RNA secondary structures have been suggested in the literature. Statistical methods, such as those that utilize stochastic context-free grammars (SCFGs), or approaches based on machine learning aim to predict the best representative structure for the underlying ensemble of possible conformations. Their parameters have therefore been trained on larger subsets of well-curated, known secondary structures. Physics-based methods, on the other hand, usually refrain from using optimized parameters. They model secondary structures from loops as individual building blocks which have been assigned a physical property instead: the free energy of the respective loop. Such free energies are either derived from experiments or from mathematical modeling. This rigorous use of physical properties then allows for the application of statistical mechanics to describe the entire state space of RNA secondary structures in terms of equilibrium probabilities. On that basis, and by using efficient algorithms, many more descriptors of the conformational state space of RNA molecules can be derived to investigate and explain the many functions of RNA molecules. Moreover, compared to other methods, physics-based models allow for a much easier extension with other properties that can be measured experimentally. For instance, small molecules or proteins can bind to an RNA and their binding affinity can be assessed experimentally. Under certain conditions, existing RNA secondary structure prediction tools can be used to model this RNA-ligand binding and to eventually shed light on its impact on structure formation and function.
Assuntos
Conformação de Ácido Nucleico , RNA , Termodinâmica , RNA/química , Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina , Modelos MolecularesRESUMO
Nucleotide modifications are occurrent in all types of RNA and play an important role in RNA structure formation and stability. Modified bases not only possess the ability to shift the RNA structure ensemble towards desired functional confirmations. By changes in the base pairing partner preference, they may even enlarge or reduce the conformational space, i.e., the number and types of structures the RNA molecule can adopt. However, most methods to predict RNA secondary structure do not provide the means to include the effect of modifications on the result. With the help of a heavily modified transfer RNA (tRNA) molecule, this chapter demonstrates how to include the effect of different base modifications into secondary structure prediction using the ViennaRNA Package. The constructive approach demonstrated here allows for the calculation of minimum free energy structure and suboptimal structures at different levels of modified base support. In particular we, show how to incorporate the isomerization of uridine to pseudouridine ( Ψ ) and the reduction of uridine to dihydrouridine (D).
Assuntos
Conformação de Ácido Nucleico , RNA , RNA/química , RNA de Transferência/química , RNA de Transferência/metabolismo , Nucleotídeos/química , Pareamento de Bases , Biologia Computacional/métodos , Termodinâmica , Software , Uridina/química , Modelos Moleculares , Pseudouridina/químicaRESUMO
Although RNA molecules are synthesized via transcription, little is known about the general impact of cotranscriptional folding in vivo. We present different computational approaches for the simulation of changing structure ensembles during transcription, including interpretations with respect to experimental data from literature. Specifically, we analyze different mutations of the E. coli SRP RNA, which has been studied comparatively well in previous literature, yet the details of which specific metastable structures form as well as when they form are still under debate. Here, we combine thermodynamic and kinetic, deterministic, and stochastic models with automated and visual inspection of those systems to derive the most likely scenario of which substructures form at which point during transcription. The simulations do not only provide explanations for present experimental observations but also suggest previously unnoticed conformations that may be verified through future experimental studies.
Assuntos
Escherichia coli , Conformação de Ácido Nucleico , Dobramento de RNA , RNA Bacteriano , Termodinâmica , Transcrição Gênica , RNA Bacteriano/química , RNA Bacteriano/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Partícula de Reconhecimento de Sinal/química , Partícula de Reconhecimento de Sinal/metabolismo , Partícula de Reconhecimento de Sinal/genética , Cinética , Biologia Computacional/métodos , Mutação , Modelos MolecularesRESUMO
Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.
Assuntos
Algoritmos , Conformação de Ácido Nucleico , Filogenia , RNA , Termodinâmica , RNA/química , RNA/genética , Pareamento de Bases , Dobramento de RNA , Sequência de Bases , Biologia Computacional/métodosRESUMO
MOTIVATION: While there are numerous programs that can predict RNA or DNA secondary structures, a program that predicts RNA/DNA hetero-dimers is still missing. The lack of easy to use tools for predicting their structure may be in part responsible for the small number of reports of biologically relevant RNA/DNA hetero-dimers. RESULTS: We present here an extension to the widely used ViennaRNA Package (Lorenz et al., 2011) for the prediction of the structure of RNA/DNA hetero-dimers. AVAILABILITY: http://www.tbi.univie.ac.at/~ronny/RNA/vrna2.html CONTACT: ronny@tbi.univie.ac.at, berni@bioinf.uni-leipzig.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
DNA/química , Dobramento de RNA , RNA/química , Software , Algoritmos , Biologia Computacional/métodos , DimerizaçãoRESUMO
BACKGROUND: RNA features a highly negatively charged phosphate backbone that attracts a cloud of counter-ions that reduce the electrostatic repulsion in a concentration dependent manner. Ion concentrations thus have a large influence on folding and stability of RNA structures. Despite their well-documented effects, salt effects are not handled consistently by currently available secondary structure prediction algorithms. Combining Debye-Hückel potentials for line charges and Manning's counter-ion condensation theory, Einert et al. (Biophys J 100: 2745-2753, 2011) modeled the energetic contributions of monovalent cations on loops and helices. RESULTS: The model of Einert et al. is adapted to match the structure of the dynamic programming recursion of RNA secondary structure prediction algorithms. An empirical term describing the salt dependence of the duplex initiation energy is added to improve co-folding predictions for two or more RNA strands. The slightly modified model is implemented in the ViennaRNA package in such way that only the energy parameters but not the algorithmic structure is affected. A comparison with data from the literature show that predicted free energies and melting temperatures are in reasonable agreement with experiments. CONCLUSION: The new feature in the ViennaRNA package makes it possible to study effects of salt concentrations on RNA folding in a systematic manner. Strictly speaking, the model pertains only to mono-valent cations, and thus covers the most important parameter, i.e., the NaCl concentration. It remains a question for future research to what extent unspecific effects of bi- and tri-valent cations can be approximated in a similar manner. AVAILABILITY: Corrections for the concentration of monovalent cations are available in the ViennaRNA package starting from version 2.6.0.
RESUMO
Most of the functional RNA elements located within large transcripts are local. Local folding therefore serves a practically useful approximation to global structure prediction. Due to the sensitivity of RNA secondary structure prediction to the exact definition of sequence ends, accuracy can be increased by averaging local structure predictions over multiple, overlapping sequence windows. These averages can be computed efficiently by dynamic programming. Here we revisit the local folding problem, present a concise mathematical formalization that generalizes previous approaches and show that correct Boltzmann samples can be obtained by local stochastic backtracing in McCaskill's algorithms but not from local folding recursions. Corresponding new features are implemented in the ViennaRNA package to improve the support of local folding. Applications include the computation of maximum expected accuracy structures from RNAplfold data and a mutual information measure to quantify the sensitivity of individual sequence positions.
Assuntos
Dobramento de RNA , RNA , Conformação de Ácido Nucleico , RNA/química , Algoritmos , RNA não TraduzidoRESUMO
Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. The accuracy of these classical methods is limited due to lack of experimental parameters and certain simplifying assumptions and has seen little improvement over the last decade. This makes RNA folding an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. However, for ML approaches to be competitive for de-novo structure prediction, the models must not just demonstrate good phenomenological fits, but be able to learn a (complex) biophysical model. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data (obtained from a simplified biophysical model) that can be generated in arbitrary amounts and where all biases can be controlled. We assume that a deep learning model that performs well on these synthetic, would also perform well on real data, and vice versa. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.
RESUMO
The overwhelming majority of small nucleolar RNAs (snoRNAs) fall into two clearly deï¬ned classes characterized by distinctive secondary structures and sequence motifs. A small group of diverse ncRNAs, however, shares the hallmarks of one or both classes of snoRNAs but diï¬ers substantially from the norm in some respects. Here, we compile the available information on these exceptional cases, conduct a thorough homology search throughout the available metazoan genomes, provide improved and expanded alignments, and investigate the evolutionary histories of these ncRNA families as well as their mutual relationships.
Assuntos
Corpos Enovelados/metabolismo , Conformação de Ácido Nucleico , RNA Nucleolar Pequeno/química , RNA Nucleolar Pequeno/genética , Animais , Sequência de Bases , Genoma/genética , Humanos , Dados de Sequência Molecular , Filogenia , RNA Nucleolar Pequeno/classificação , Alinhamento de Sequência/métodos , Homologia de Sequência do Ácido NucleicoRESUMO
Stem-bulge RNAs (sbRNAs) are a group of small, functionally yet uncharacterized noncoding RNAs first described in C. elegans, with a few homologous sequences postulated in C. briggsae. In this study, we report on a comprehensive survey of this ncRNA family in the phylum Nematoda. Employing homology search strategies based on both sequence and secondary structure models and a computational promoter screen we identified a total of 240 new sbRNA homologs. For the majority of these loci we identified both promoter regions and transcription termination signals characteristic for pol-III transcripts. Sequence and structure comparison with known RNA families revealed that sbRNAs are homologs of vertebrate Y RNAs. Most of the sbRNAs show the characteristic Ro protein binding motif, and contain a region highly similar to a functionally required motif for DNA replication previously thought to be unique to vertebrate Y RNAs. The single Y RNA that was previously described in C. elegans, however, does not show this motif, and in general bears the hallmarks of a highly derived family member.
Assuntos
Nematoides/genética , RNA não Traduzido/genética , Homologia de Sequência do Ácido Nucleico , Animais , Sequência de Bases , Cromossomos , Genes de Helmintos , Humanos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Filogenia , Regiões Promotoras Genéticas , Ribonucleoproteínas , Sintenia , VertebradosRESUMO
The Vienna RNA Websuite is a comprehensive collection of tools for folding, design and analysis of RNA sequences. It provides a web interface to the most commonly used programs of the Vienna RNA package. Among them, we find folding of single and aligned sequences, prediction of RNA-RNA interactions, and design of sequences with a given structure. Additionally, we provide analysis of folding landscapes using the barriers program and structural RNA alignments using LocARNA. The web server together with software packages for download is freely accessible at http://rna.tbi.univie.ac.at/.