RESUMO
Messenger RNA (mRNA) vaccines are being used to combat the spread of COVID-19 (refs. 1-3), but they still exhibit critical limitations caused by mRNA instability and degradation, which are major obstacles for the storage, distribution and efficacy of the vaccine products4. Increasing secondary structure lengthens mRNA half-life, which, together with optimal codons, improves protein expression5. Therefore, a principled mRNA design algorithm must optimize both structural stability and codon usage. However, owing to synonymous codons, the mRNA design space is prohibitively large-for example, there are around 2.4 × 10632 candidate mRNA sequences for the SARS-CoV-2 spike protein. This poses insurmountable computational challenges. Here we provide a simple and unexpected solution using the classical concept of lattice parsing in computational linguistics, where finding the optimal mRNA sequence is analogous to identifying the most likely sentence among similar-sounding alternatives6. Our algorithm LinearDesign finds an optimal mRNA design for the spike protein in just 11 minutes, and can concurrently optimize stability and codon usage. LinearDesign substantially improves mRNA half-life and protein expression, and profoundly increases antibody titre by up to 128 times in mice compared to the codon-optimization benchmark on mRNA vaccines for COVID-19 and varicella-zoster virus. This result reveals the great potential of principled mRNA design and enables the exploration of previously unreachable but highly stable and efficient designs. Our work is a timely tool for vaccines and other mRNA-based medicines encoding therapeutic proteins such as monoclonal antibodies and anti-cancer drugs7,8.
Assuntos
Algoritmos , Vacinas contra COVID-19 , COVID-19 , Estabilidade de RNA , RNA Mensageiro , SARS-CoV-2 , Vacinas de mRNA , Animais , Humanos , Camundongos , Códon/genética , COVID-19/genética , COVID-19/imunologia , COVID-19/prevenção & controle , Vacinas contra COVID-19/química , Vacinas contra COVID-19/genética , Vacinas contra COVID-19/imunologia , Meia-Vida , Herpesvirus Humano 3/genética , Herpesvirus Humano 3/imunologia , Vacinas de mRNA/química , Vacinas de mRNA/genética , Vacinas de mRNA/imunologia , Estabilidade de RNA/genética , Estabilidade de RNA/imunologia , RNA Mensageiro/química , RNA Mensageiro/genética , RNA Mensageiro/imunologia , RNA Mensageiro/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/imunologiaRESUMO
Riboswitches are structured RNAs that sense small molecules to control expression. Prequeuosine1 (preQ1)-sensing riboswitches comprise three classes (I, II and III) that adopt distinct folds. Despite this difference, class II and III riboswitches each use 10 identical nucleotides to bind the preQ1 metabolite. Previous class II studies showed high sensitivity to binding-pocket mutations, which reduced preQ1 affinity and impaired function. Here, we introduced four equivalent mutations into a class III riboswitch, which maintained remarkably tight preQ1 binding. Co-crystal structures of each class III mutant showed compensatory interactions that preserve the fold. Chemical modification analysis revealed localized RNA flexibility changes for each mutant, but molecular dynamics (MD) simulations suggested that each mutation was not overtly destabilizing. Although impaired, class III mutants retained tangible gene-regulatory activity in bacteria compared to equivalent preQ1-II variants; mutations in the preQ1-pocket floor were tolerated better than wall mutations. Principal component analysis of MD trajectories suggested that the most functionally deleterious wall mutation samples different motions compared to wildtype. Overall, the results reveal that formation of compensatory interactions depends on the context of mutations within the overall fold and that functionally deleterious mutations can alter long-range correlated motions that link the riboswitch binding pocket with distal gene-regulatory sequences.
RESUMO
Many RNAs fold into multiple structures at equilibrium, and there is a need to sample these structures according to their probabilities in the ensemble. The conventional sampling algorithm suffers from two limitations: (i) the sampling phase is slow due to many repeated calculations; and (ii) the end-to-end runtime scales cubically with the sequence length. These issues make it difficult to be applied to long RNAs, such as the full genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To address these problems, we devise a new sampling algorithm, LazySampling, which eliminates redundant work via on-demand caching. Based on LazySampling, we further derive LinearSampling, an end-to-end linear time sampling algorithm. Benchmarking on nine diverse RNA families, the sampled structures from LinearSampling correlate better with the well-established secondary structures than Vienna RNAsubopt and RNAplfold. More importantly, LinearSampling is orders of magnitude faster than standard tools, being 428× faster (72 s versus 8.6 h) than RNAsubopt on the full genome of SARS-CoV-2 (29 903 nt). The resulting sample landscape correlates well with the experimentally guided secondary structure models, and is closer to the alternative conformations revealed by experimentally driven analysis. Finally, LinearSampling finds 23 regions of 15 nt with high accessibilities in the SARS-CoV-2 genome, which are potential targets for COVID-19 diagnostics and therapeutics.
Assuntos
Algoritmos , COVID-19 , SARS-CoV-2 , Humanos , Sequência de Bases , COVID-19/diagnóstico , COVID-19/genética , RNA Viral/genética , RNA Viral/química , SARS-CoV-2/genética , Conformação de Ácido NucleicoRESUMO
Many RNAs function through RNA-RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA-RNA interaction is useful, however, existing tools are either too simplistic or too slow. To address this issue, we present LinearCoFold, which approximates the complete minimum free energy structure of two strands in linear time, and LinearCoPartition, which approximates the cofolding partition function and base pairing probabilities in linear time. LinearCoFold and LinearCoPartition are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8× faster than RNAcofold MFE mode, and LinearCoPartition is 642.3× faster than RNAcofold partition function mode. Surprisingly, LinearCoFold and LinearCoPartition's predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA-RNA interaction between SARS-CoV-2 genomic RNA (gRNA) and human U4 small nuclear RNA (snRNA), which has been experimentally studied, and observe that LinearCoFold's prediction correlates better with the wet lab results than RNAcofold's.
Assuntos
Algoritmos , RNA , Humanos , Pareamento de Bases , Genômica , Conformação de Ácido Nucleico , RNA/química , RNA/metabolismo , RNA Viral/química , SARS-CoV-2/químicaRESUMO
Riboswitches regulate downstream gene expression by binding cellular metabolites. Regulation of translation initiation by riboswitches is posited to occur by metabolite-mediated sequestration of the Shine-Dalgarno sequence (SDS), causing bypass by the ribosome. Recently, we solved a co-crystal structure of a prequeuosine1-sensing riboswitch from Carnobacterium antarcticum that binds two metabolites in a single pocket. The structure revealed that the second nucleotide within the gene-regulatory SDS, G34, engages in a crystal contact, obscuring the molecular basis of gene regulation. Here, we report a co-crystal structure wherein C10 pairs with G34. However, molecular dynamics simulations reveal quick dissolution of the pair, which fails to reform. Functional and chemical probing assays inside live bacterial cells corroborate the dispensability of the C10-G34 pair in gene regulation, leading to the hypothesis that the compact pseudoknot fold is sufficient for translation attenuation. Remarkably, the C. antarcticum aptamer retained significant gene-regulatory activity when uncoupled from the SDS using unstructured spacers up to 10 nucleotides away from the riboswitch-akin to steric-blocking employed by sRNAs. Accordingly, our work reveals that the RNA fold regulates translation without SDS sequestration, expanding known riboswitch-mediated gene-regulatory mechanisms. The results infer that riboswitches exist wherein the SDS is not embedded inside a stable fold.
Assuntos
Biossíntese de Proteínas , Riboswitch , Sítios de Ligação , Regulação da Expressão Gênica , Simulação de Dinâmica Molecular , Conformação de Ácido Nucleico , Ribossomos/genética , Ribossomos/metabolismoRESUMO
MOTIVATION: RNA design is the search for a sequence or set of sequences that will fold to desired structure, also known as the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, which worsens for long sequence design. Additionally, for many methods only a small number of sequences satisfying the MFE criterion can be found by each run of design. These drawbacks limit their use cases. RESULTS: We propose an innovative optimization paradigm, SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) by iterative search and yields a very large number of successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation, and updating. Our work, while being less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our algorithm solves the most Eterna100 puzzles among all the general optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on handcrafted heuristics designed for a specific folding model. Surprisingly, our approach shows superiority on designing long sequences for structures adapted from the database of 16S Ribosomal RNAs. AVAILABILITY AND IMPLEMENTATION: Our source code and data used in this article is available at https://github.com/shanry/SAMFEO.
Assuntos
Algoritmos , Benchmarking , Bases de Dados Factuais , Mutação , RNA Ribossômico 16SRESUMO
Influenza A virus (IAV) is a respiratory virus that causes epidemics and pandemics. Knowledge of IAV RNA secondary structure in vivo is crucial for a better understanding of virus biology. Moreover, it is a fundament for the development of new RNA-targeting antivirals. Chemical RNA mapping using selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) coupled with Mutational Profiling (MaP) allows for the thorough examination of secondary structures in low-abundance RNAs in their biological context. So far, the method has been used for analyzing the RNA secondary structures of several viruses including SARS-CoV-2 in virio and in cellulo. Here, we used SHAPE-MaP and dimethyl sulfate mutational profiling with sequencing (DMS-MaPseq) for genome-wide secondary structure analysis of viral RNA (vRNA) of the pandemic influenza A/California/04/2009 (H1N1) strain in both in virio and in cellulo environments. Experimental data allowed the prediction of the secondary structures of all eight vRNA segments in virio and, for the first time, the structures of vRNA5, 7, and 8 in cellulo. We conducted a comprehensive structural analysis of the proposed vRNA structures to reveal the motifs predicted with the highest accuracy. We also performed a base-pairs conservation analysis of the predicted vRNA structures and revealed many highly conserved vRNA motifs among the IAVs. The structural motifs presented herein are potential candidates for new IAV antiviral strategies.
Assuntos
COVID-19 , Vírus da Influenza A Subtipo H1N1 , Vírus da Influenza A , Humanos , Vírus da Influenza A Subtipo H1N1/genética , SARS-CoV-2/genética , Vírus da Influenza A/genética , RNA Viral/genética , GenômicaRESUMO
Nearest neighbor parameters for estimating the folding stability of RNA secondary structures are in widespread use. For helices, current parameters penalize terminal AU base pairs relative to terminal GC base pairs. We curated an expanded database of helix stabilities determined by optical melting experiments. Analysis of the updated database shows that terminal penalties depend on the sequence identity of the adjacent penultimate base pair. New nearest neighbor parameters that include this additional sequence dependence accurately predict the measured values of 271 helices in an updated database with a correlation coefficient of 0.982. This refined understanding of helix ends facilitates fitting terms for base pair stacks with GU pairs. Prior parameter sets treated 5'GGUC3' paired to 3'CUGG5' separately from other 5'GU3'/3'UG5' stacks. The improved understanding of helix end stability, however, makes the separate treatment unnecessary. Introduction of the additional terms was tested with three optical melting experiments. The average absolute difference between measured and predicted free energy changes at 37°C for these three duplexes containing terminal adjacent AU and GU pairs improved from 1.38 to 0.27 kcal/mol. This confirms the need for the additional sequence dependence in the model.
Assuntos
Dobramento de RNA , RNA , Sequência de Bases , Conformação de Ácido Nucleico , RNA/química , TermodinâmicaRESUMO
The essential pre-mRNA splicing factor U2AF2 (also called U2AF65) identifies polypyrimidine (Py) tract signals of nascent transcripts, despite length and sequence variations. Previous studies have shown that the U2AF2 RNA recognition motifs (RRM1 and RRM2) preferentially bind uridine-rich RNAs. Nonetheless, the specificity of the RRM1/RRM2 interface for the central Py tract nucleotide has yet to be investigated. We addressed this question by determining crystal structures of U2AF2 bound to a cytidine, guanosine, or adenosine at the central position of the Py tract, and compared U2AF2-bound uridine structures. Local movements of the RNA site accommodated the different nucleotides, whereas the polypeptide backbone remained similar among the structures. Accordingly, molecular dynamics simulations revealed flexible conformations of the central, U2AF2-bound nucleotide. The RNA binding affinities and splicing efficiencies of structure-guided mutants demonstrated that U2AF2 tolerates nucleotide substitutions at the central position of the Py tract. Moreover, enhanced UV-crosslinking and immunoprecipitation of endogenous U2AF2 in human erythroleukemia cells showed uridine-sensitive binding sites, with lower sequence conservation at the central nucleotide positions of otherwise uridine-rich, U2AF2-bound splice sites. Altogether, these results highlight the importance of RNA flexibility for protein recognition and take a step towards relating splice site motifs to pre-mRNA splicing efficiencies.
Assuntos
Nucleotídeos , Precursores de RNA , Fator de Processamento U2AF , Humanos , Nucleotídeos/metabolismo , RNA/metabolismo , Precursores de RNA/metabolismo , Splicing de RNA , Fator de Processamento U2AF/metabolismo , Uridina/metabolismoRESUMO
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (â¼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (â¼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Assuntos
Algoritmos , RNA Viral/química , SARS-CoV-2/química , Betacoronavirus/química , Betacoronavirus/genética , Sequência Conservada , Genoma Viral , Mutação , Conformação de Ácido Nucleico , Dobramento de RNA , RNA Viral/genética , SARS-CoV-2/genética , Alinhamento de SequênciaRESUMO
The cellular levels of mRNAs are controlled post-transcriptionally by cis-regulatory elements located in the 3'-untranslated region. These linear or structured elements are recognized by RNA-binding proteins (RBPs) to modulate mRNA stability. The Roquin-1 and -2 proteins specifically recognize RNA stem-loop motifs, the trinucleotide loop-containing constitutive decay elements (CDEs) and the hexanucleotide loop-containing alternative decay elements (ADEs), with their unique ROQ domain to initiate mRNA degradation. However, the RNA-binding capacity of Roquin towards different classes of stem-loops has not been rigorously characterized, leaving its exact binding preferences unclear. Here, we map the RNA-binding preference of the ROQ domain at nucleotide resolution introducing sRBNS (structured RNA Bind-n-Seq), a customized RBNS workflow with pre-structured RNA libraries. We found a clear preference of Roquin towards specific loop sizes and extended the consensus motifs for CDEs and ADEs. The newly identified motifs are recognized with nanomolar affinity through the canonical RNA-ROQ interface. Using these new stem-loop variants as blueprints, we predicted novel Roquin target mRNAs and verified the expanded target space in cells. The study demonstrates the power of high-throughput assays including RNA structure formation for the systematic investigation of (structural) RNA-binding preferences to comprehensively identify mRNA targets and elucidate the biological function of RBPs.
RESUMO
MOTIVATION: The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem. RESULTS: We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family. AVAILABILITY AND IMPLEMENTATION: Source code and data are available at https://github.com/marcellszi/dl-rna. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado Profundo , RNA , Humanos , Redes Neurais de Computação , Estrutura Secundária de Proteína , Aprendizado de MáquinaRESUMO
Riboswitches are structured RNA motifs that recognize metabolites to alter the conformations of downstream sequences, leading to gene regulation. To investigate this molecular framework, we determined crystal structures of a preQ1-I riboswitch in effector-free and bound states at 2.00 Å and 2.65 Å-resolution. Both pseudoknots exhibited the elusive L2 loop, which displayed distinct conformations. Conversely, the Shine-Dalgarno sequence (SDS) in the S2 helix of each structure remained unbroken. The expectation that the effector-free state should expose the SDS prompted us to conduct solution experiments to delineate environmental changes to specific nucleobases in response to preQ1. We then used nudged elastic band computational methods to derive conformational-change pathways linking the crystallographically-determined effector-free and bound-state structures. Pathways featured: (i) unstacking and unpairing of L2 and S2 nucleobases without preQ1-exposing the SDS for translation and (ii) stacking and pairing L2 and S2 nucleobases with preQ1-sequestering the SDS. Our results reveal how preQ1 binding reorganizes L2 into a nucleobase-stacking spine that sequesters the SDS, linking effector recognition to biological function. The generality of stacking spines as conduits for effector-dependent, interdomain communication is discussed in light of their existence in adenine riboswitches, as well as the turnip yellow mosaic virus ribosome sensor.
Assuntos
Simulação de Dinâmica Molecular , Riboswitch , Pareamento de Bases , Regulação Bacteriana da Expressão Gênica , Guanina/análogos & derivados , Dodecilsulfato de Sódio/química , Thermoanaerobacter/genéticaRESUMO
Sequence variation in tRNA genes influences the structure, modification, and stability of tRNA; affects translation fidelity; impacts the activity of numerous isodecoders in metazoans; and leads to human diseases. To comprehensively define the effects of sequence variation on tRNA function, we developed a high-throughput in vivo screen to quantify the activity of a model tRNA, the nonsense suppressor SUP4oc of Saccharomyces cerevisiae. Using a highly sensitive fluorescent reporter gene with an ochre mutation, fluorescence-activated cell sorting of a library of SUP4oc mutant yeast strains, and deep sequencing, we scored 25,491 variants. Unexpectedly, SUP4oc tolerates numerous sequence variations, accommodates slippage in tertiary and secondary interactions, and exhibits genetic interactions that suggest an alternative functional tRNA conformation. Furthermore, we used this methodology to define tRNA variants subject to rapid tRNA decay (RTD). Even though RTD normally degrades tRNAs with exposed 5' ends, mutations that sensitize SUP4oc to RTD were found to be located throughout the sequence, including the anti-codon stem. Thus, the integrity of the entire tRNA molecule is under surveillance by cellular quality control machinery. This approach to assess activity at high throughput is widely applicable to many problems in tRNA biology.
Assuntos
Estabilidade de RNA/genética , RNA de Transferência/genética , RNA de Transferência/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Citometria de Fluxo , Variação Genética , Ensaios de Triagem em Larga Escala , Mutação/genética , Conformação de Ácido Nucleico , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
Influenza A virus (IAV) is a member of the single-stranded RNA (ssRNA) family of viruses. The most recent global pandemic caused by the SARS-CoV-2 virus has shown the major threat that RNA viruses can pose to humanity. In comparison, influenza has an even higher pandemic potential as a result of its high rate of mutations within its relatively short (<13 kbp) genome, as well as its capability to undergo genetic reassortment. In light of this threat, and the fact that RNA structure is connected to a broad range of known biological functions, deeper investigation of viral RNA (vRNA) structures is of high interest. Here, for the first time, we propose a secondary structure for segment 8 vRNA (vRNA8) of A/California/04/2009 (H1N1) formed in the presence of cellular and viral components. This structure shows similarities with prior in vitro experiments. Additionally, we determined the location of several well-defined, conserved structural motifs of vRNA8 within IAV strains with possible functionality. These RNA motifs appear to fold independently of regional nucleoprotein (NP)-binding affinity, but a low or uneven distribution of NP in each motif region is noted. This research also highlights several accessible sites for oligonucleotide tools and small molecules in vRNA8 in a cellular environment that might be a target for influenza A virus inhibition on the RNA level.
Assuntos
Regulação Viral da Expressão Gênica , Genoma Viral/genética , Vírus da Influenza A Subtipo H1N1/genética , Conformação de Ácido Nucleico , RNA Viral/química , Animais , Sequência de Bases , Cães , Humanos , Vírus da Influenza A Subtipo H1N1/metabolismo , Influenza Humana/virologia , Células Madin Darby de Rim Canino , Modelos Moleculares , Motivos de Nucleotídeos/genética , Dobramento de RNA , RNA Viral/genética , Proteínas Virais/genética , Proteínas Virais/metabolismoRESUMO
The COVID-19 pandemic has had far-reaching effects for individuals and healthcare systems in the United States. Increasing and sustaining behavioral changes to reduce transmission of disease among medical providers is essential for the protection of the community at large. Using a social norms perspective, this study aimed to (a) examine the accuracy of perceptions of engagement in protective health behaviors among a sample of rural health providers, and (b) determine whether greater self-other discrepancies were associated with engagement in these behaviors. Electronic surveys were completed by 214 rural medical providers. Findings suggested that rural healthcare providers had exaggerated perceptions of peer engagement in several COVID-19-related protective health behaviors. As expected, positive self-other differences were positively associated with providers' own behaviors, and perceived descriptive norms were associated with providers' engagement in these behaviors. Future studies using normative interventions might examine how positive self-other differences increase the use of protective health behaviors over time.
Assuntos
COVID-19 , Normas Sociais , COVID-19/prevenção & controle , Comportamentos Relacionados com a Saúde , Humanos , Pandemias/prevenção & controle , Saúde da População Rural , Estados UnidosRESUMO
Nearest neighbor parameters for estimating the folding stability of RNA are commonly used in secondary structure prediction, for generating folding ensembles of structures, and for analyzing RNA function. Previously, we demonstrated that we could quantify the uncertainties in each nearest neighbor parameter by perturbing the underlying optical melting data within experimental error and rederiving the parameters, which accounts for the substantial correlations that exist between the parameters. In this contribution, we describe a method to estimate uncertainty in the estimated folding stabilities of RNA structures, accounting for correlations in the nearest neighbor parameters. This method is incorporated in the RNA structure software package.
Assuntos
Algoritmos , Dobramento de RNA , RNA/química , Software , Pareamento de Bases , Sequência de Bases , Humanos , Termodinâmica , IncertezaRESUMO
MOTIVATION: RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore prohibitively slow for long sequences. This slowness is even more severe than cubic-time free energy minimization due to a substantially larger constant factor in runtime. RESULTS: Inspired by the success of our recent LinearFold algorithm that predicts the approximate minimum free energy structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base-pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g. 2.5 days versus 1.3 min on a sequence with length 32 753 nt). More interestingly, the resulting base-pairing probabilities are even better correlated with the ground-truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNAs), as well as a substantial improvement on long-distance base pairs (500+ nt apart). AVAILABILITY AND IMPLEMENTATION: Code: http://github.com/LinearFold/LinearPartition; Server: http://linearfold.org/partition. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Dobramento de RNA , RNA , Algoritmos , Pareamento de Bases , Humanos , Conformação de Ácido Nucleico , Probabilidade , RNA/genética , Análise de Sequência de RNARESUMO
Design of RNA sequences that adopt functional folds establishes principles of RNA folding and applications in biotechnology. Inverse folding for RNAs, which allows computational design of sequences that adopt specific structures, can be utilized for unveiling RNA functions and developing genetic tools in synthetic biology. Although many algorithms for inverse RNA folding have been developed, the pseudoknot, which plays a key role in folding of ribozymes and riboswitches, is not addressed in most algorithms. For the few algorithms that attempt to predict pseudoknot-containing ribozymes, self-cleavage activity has not been tested. Herein, we design double-pseudoknot HDV ribozymes using an inverse RNA folding algorithm and test their kinetic mechanisms experimentally. More than 90% of the positively designed ribozymes possess self-cleaving activity, whereas more than 70% of negative control ribozymes, which are predicted to fold to the necessary structure but with low fidelity, do not possess it. Kinetic and mutation analyses reveal that these RNAs cleave site-specifically and with the same mechanism as the WT ribozyme. Most ribozymes react just 50- to 80-fold slower than the WT ribozyme, and this rate can be improved to near WT by modification of a junction. Thus, fast-cleaving functional ribozymes with multiple pseudoknots can be designed computationally.
Assuntos
Biologia Computacional/métodos , Dobramento de RNA , RNA Catalítico/química , Riboswitch/genética , Algoritmos , Biotecnologia/tendências , Cinética , Conformação de Ácido Nucleico , RNA Catalítico/genética , Biologia Sintética/tendênciasRESUMO
Synonymous codons provide redundancy in the genetic code that influences translation rates in many organisms, in which overall codon use is driven by selection for optimal codons. It is unresolved if or to what extent translational selection drives use of suboptimal codons or codon pairs. In Saccharomyces cerevisiae, 17 specific inhibitory codon pairs, each comprised of adjacent suboptimal codons, inhibit translation efficiency in a manner distinct from their constituent codons, and many are translated slowly in native genes. We show here that selection operates within Saccharomyces sensu stricto yeasts to conserve nine of these codon pairs at defined positions in genes. Conservation of these inhibitory codon pairs is significantly greater than expected, relative to conservation of their constituent codons, with seven pairs more highly conserved than any other synonymous pair. Conservation is strongly correlated with slow translation of the pairs. Conservation of suboptimal codon pairs extends to two related Candida species, fungi that diverged from Saccharomyces â¼270 million years ago, with an enrichment for codons decoded by Iâ¢A and Uâ¢G wobble in both Candida and Saccharomyces. Thus, conservation of inhibitory codon pairs strongly implies selection for slow translation at particular gene locations, executed by suboptimal codon pairs.