Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
PLoS One ; 15(1): e0227177, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31978147

RESUMO

Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.


Assuntos
Sequência de Bases/genética , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Software , Algoritmos , Confiabilidade dos Dados , Humanos , Modelos Estatísticos , Conformação de Ácido Nucleico , Probabilidade , Análise de Sequência de RNA , Fatores de Tempo
2.
J Comput Biol ; 26(1): 16-26, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30383444

RESUMO

Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${{ \cal S}_n}$$ \end{document} denote the network of all RNA secondary structures of length n, in which undirected edges exist between structures s, t such that t is obtained from s by the addition, removal, or shift of a single base pair. Using context-free grammars, generating functions, and complex analysis, we show that the asymptotic average degree is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$O ( n )$$ \end{document} , and that the asymptotic clustering coefficient is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$O ( 1 / n )$$ \end{document} , from which it follows that the family \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${{ \cal S}_n}$$ \end{document} , \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$n = 1 , 2 , 3 , \ldots$$ \end{document} of secondary structure networks is not small world.


Assuntos
RNA/química , Algoritmos , Modelos Moleculares , Conformação de Ácido Nucleico
3.
J Math Biol ; 76(5): 1195-1227, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-28780735

RESUMO

RNA secondary structure folding kinetics is known to be important for the biological function of certain processes, such as the hok/sok system in E. coli. Although linear algebra provides an exact computational solution of secondary structure folding kinetics with respect to the Turner energy model for tiny ([Formula: see text]20 nt) RNA sequences, the folding kinetics for larger sequences can only be approximated by binning structures into macrostates in a coarse-grained model, or by repeatedly simulating secondary structure folding with either the Monte Carlo algorithm or the Gillespie algorithm. Here we investigate the relation between the Monte Carlo algorithm and the Gillespie algorithm. We prove that asymptotically, the expected time for a K-step trajectory of the Monte Carlo algorithm is equal to [Formula: see text] times that of the Gillespie algorithm, where [Formula: see text] denotes the Boltzmann expected network degree. If the network is regular (i.e. every node has the same degree), then the mean first passage time (MFPT) computed by the Monte Carlo algorithm is equal to MFPT computed by the Gillespie algorithm multiplied by [Formula: see text]; however, this is not true for non-regular networks. In particular, RNA secondary structure folding kinetics, as computed by the Monte Carlo algorithm, is not equal to the folding kinetics, as computed by the Gillespie algorithm, although the mean first passage times are roughly correlated. Simulation software for RNA secondary structure folding according to the Monte Carlo and Gillespie algorithms is publicly available, as is our software to compute the expected degree of the network of secondary structures of a given RNA sequence-see http://bioinformatics.bc.edu/clote/RNAexpNumNbors .


Assuntos
Algoritmos , Modelos Moleculares , Dobramento de RNA , Sequência de Bases , Cinética , Cadeias de Markov , Conceitos Matemáticos , Método de Monte Carlo , RNA/química
4.
BMC Bioinformatics ; 17(1): 530, 2016 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-27964762

RESUMO

BACKGROUND: Retroviruses transcribe messenger RNA for the overlapping Gag and Gag-Pol polyproteins, by using a programmed -1 ribosomal frameshift which requires a slippery sequence and an immediate downstream stem-loop secondary structure, together called frameshift stimulating signal (FSS). It follows that the molecular evolution of this genomic region of HIV-1 is highly constrained, since the retroviral genome must contain a slippery sequence (sequence constraint), code appropriate peptides in reading frames 0 and 1 (coding requirements), and form a thermodynamically stable stem-loop secondary structure (structure requirement). RESULTS: We describe a unique computational tool, RNAsampleCDS, designed to compute the number of RNA sequences that code two (or more) peptides p,q in overlapping reading frames, that are identical (or have BLOSUM/PAM similarity that exceeds a user-specified value) to the input peptides p,q. RNAsampleCDS then samples a user-specified number of messenger RNAs that code such peptides; alternatively, RNAsampleCDS can exactly compute the position-specific scoring matrix and codon usage bias for all such RNA sequences. Our software allows the user to stipulate overlapping coding requirements for all 6 possible reading frames simultaneously, even allowing IUPAC constraints on RNA sequences and fixing GC-content. We generalize the notion of codon preference index (CPI) to overlapping reading frames, and use RNAsampleCDS to generate control sequences required in the computation of CPI. Moreover, by applying RNAsampleCDS, we are able to quantify the extent to which the overlapping coding requirement in HIV-1 [resp. HCV] contribute to the formation of the stem-loop [resp. double stem-loop] secondary structure known as the frameshift stimulating signal. Using our software, we confirm that certain experimentally determined deleterious HCV mutations occur in positions for which our software RNAsampleCDS and RNAiFold both indicate a single possible nucleotide. We generalize the notion of codon preference index (CPI) to overlapping coding regions, and use RNAsampleCDS to generate control sequences required in the computation of CPI for the Gag-Pol overlapping coding region of HIV-1. These applications show that RNAsampleCDS constitutes a unique tool in the software arsenal now available to evolutionary biologists. CONCLUSION: Source code for the programs and additional data are available at http://bioinformatics.bc.edu/clotelab/RNAsampleCDS/ .


Assuntos
Códon/genética , Biologia Computacional/métodos , HIV-1/genética , Fases de Leitura Aberta , RNA Viral/genética , Sequência de Bases , Códon/metabolismo , Biologia Computacional/instrumentação , Infecções por HIV/virologia , HIV-1/química , Humanos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Matrizes de Pontuação de Posição Específica , RNA Viral/química , Fases de Leitura , Software
5.
BMC Bioinformatics ; 17(1): 424, 2016 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-27756204

RESUMO

BACKGROUND: RNA inverse folding is the problem of finding one or more sequences that fold into a user-specified target structure s 0, i.e. whose minimum free energy secondary structure is identical to the target s 0. Here we consider the ensemble of all RNA sequences that have low free energy with respect to a given target s 0. RESULTS: We introduce the program RNAdualPF, which computes the dual partition function Z ∗, defined as the sum of Boltzmann factors exp(-E(a,s 0)/RT) of all RNA nucleotide sequences a compatible with target structure s 0. Using RNAdualPF, we efficiently sample RNA sequences that approximately fold into s 0, where additionally the user can specify IUPAC sequence constraints at certain positions, and whether to include dangles (energy terms for stacked, single-stranded nucleotides). Moreover, since we also compute the dual partition function Z ∗(k) over all sequences having GC-content k, the user can require that all sampled sequences have a precise, specified GC-content. Using Z ∗, we compute the dual expected energy 〈E ∗〉, and use it to show that natural RNAs from the Rfam 12.0 database have higher minimum free energy than expected, thus suggesting that functional RNAs are under evolutionary pressure to be only marginally thermodynamically stable. We show that C. elegans precursor microRNA (pre-miRNA) is significantly non-robust with respect to mutations, by comparing the robustness of each wild type pre-miRNA sequence with 2000 [resp. 500] sequences of the same GC-content generated by RNAdualPF, which approximately [resp. exactly] fold into the wild type target structure. We confirm and strengthen earlier findings that precursor microRNAs and bacterial small noncoding RNAs display plasticity, a measure of structural diversity. CONCLUSION: We describe RNAdualPF, which rapidly computes the dual partition function Z ∗ and samples sequences having low energy with respect to a target structure, allowing sequence constraints and specified GC-content. Using different inverse folding software, another group had earlier shown that pre-miRNA is mutationally robust, even controlling for compositional bias. Our opposite conclusion suggests a cautionary note that computationally based insights into molecular evolution may heavily depend on the software used. C/C++-software for RNAdualPF is available at http://bioinformatics.bc.edu/clotelab/RNAdualPF .


Assuntos
Caenorhabditis elegans/genética , Biologia Computacional/métodos , Escherichia coli/genética , Evolução Molecular , MicroRNAs/genética , RNA Nuclear Pequeno/genética , Software , Algoritmos , Animais , Bases de Dados Factuais , RNA/química , Dobramento de RNA , Análise de Sequência de RNA/métodos
6.
Bioinformatics ; 32(12): i360-i368, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307638

RESUMO

MOTIVATION: RNA thermometers (RNATs) are cis-regulatory elements that change secondary structure upon temperature shift. Often involved in the regulation of heat shock, cold shock and virulence genes, RNATs constitute an interesting potential resource in synthetic biology, where engineered RNATs could prove to be useful tools in biosensors and conditional gene regulation. RESULTS: Solving the 2-temperature inverse folding problem is critical for RNAT engineering. Here we introduce RNAiFold2T, the first Constraint Programming (CP) and Large Neighborhood Search (LNS) algorithms to solve this problem. Benchmarking tests of RNAiFold2T against existent programs (adaptive walk and genetic algorithm) inverse folding show that our software generates two orders of magnitude more solutions, thus allowing ample exploration of the space of solutions. Subsequently, solutions can be prioritized by computing various measures, including probability of target structure in the ensemble, melting temperature, etc. Using this strategy, we rationally designed two thermosensor internal ribosome entry site (thermo-IRES) elements, whose normalized cap-independent translation efficiency is approximately 50% greater at 42 °C than 30 °C, when tested in reticulocyte lysates. Translation efficiency is lower than that of the wild-type IRES element, which on the other hand is fully resistant to temperature shift-up. This appears to be the first purely computational design of functional RNA thermoswitches, and certainly the first purely computational design of functional thermo-IRES elements. AVAILABILITY: RNAiFold2T is publicly available as part of the new release RNAiFold3.0 at https://github.com/clotelab/RNAiFold and http://bioinformatics.bc.edu/clotelab/RNAiFold, which latter has a web server as well. The software is written in C ++ and uses OR-Tools CP search engine. CONTACT: clote@bc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Dobramento de RNA , Algoritmos , Sequência de Bases , Sítios Internos de Entrada Ribossomal , Conformação de Ácido Nucleico , RNA , Software
7.
Sci Rep ; 6: 24243, 2016 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-27053355

RESUMO

The function of Internal Ribosome Entry Site (IRES) elements is intimately linked to their RNA structure. Viral IRES elements are organized in modular domains consisting of one or more stem-loops that harbor conserved RNA motifs critical for internal initiation of translation. A conserved motif is the pyrimidine-tract located upstream of the functional initiation codon in type I and II picornavirus IRES. By computationally designing synthetic RNAs to fold into a structure that sequesters the polypyrimidine tract in a hairpin, we establish a correlation between predicted inaccessibility of the pyrimidine tract and IRES activity, as determined in both in vitro and in vivo systems. Our data supports the hypothesis that structural sequestration of the pyrimidine-tract within a stable hairpin inactivates IRES activity, since the stronger the stability of the hairpin the higher the inhibition of protein synthesis. Destabilization of the stem-loop immediately upstream of the pyrimidine-tract also decreases IRES activity. Our work introduces a hybrid computational/experimental method to determine the importance of structural motifs for biological function. Specifically, we show the feasibility of using the software RNAiFold to design synthetic RNAs with particular sequence and structural motifs that permit subsequent experimental determination of the importance of such motifs for biological function.


Assuntos
Sítios Internos de Entrada Ribossomal/genética , Motivos de Nucleotídeos/genética , Picornaviridae/genética , RNA Viral/genética , Sequência de Bases , Modelos Moleculares , Conformação de Ácido Nucleico , Filogenia , Biossíntese de Proteínas/genética , Pirimidinas/química , Pirimidinas/metabolismo , RNA Viral/síntese química , RNA Viral/classificação , Homologia de Sequência do Ácido Nucleico
8.
PLoS One ; 10(11): e0137859, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26555444

RESUMO

Conformational entropy for atomic-level, three dimensional biomolecules is known experimentally to play an important role in protein-ligand discrimination, yet reliable computation of entropy remains a difficult problem. Here we describe the first two accurate and efficient algorithms to compute the conformational entropy for RNA secondary structures, with respect to the Turner energy model, where free energy parameters are determined from UV absorption experiments. An algorithm to compute the derivational entropy for RNA secondary structures had previously been introduced, using stochastic context free grammars (SCFGs). However, the numerical value of derivational entropy depends heavily on the chosen context free grammar and on the training set used to estimate rule probabilities. Using data from the Rfam database, we determine that both of our thermodynamic methods, which agree in numerical value, are substantially faster than the SCFG method. Thermodynamic structural entropy is much smaller than derivational entropy, and the correlation between length-normalized thermodynamic entropy and derivational entropy is moderately weak to poor. In applications, we plot the structural entropy as a function of temperature for known thermoswitches, such as the repression of heat shock gene expression (ROSE) element, we determine that the correlation between hammerhead ribozyme cleavage activity and total free energy is improved by including an additional free energy term arising from conformational entropy, and we plot the structural entropy of windows of the HIV-1 genome. Our software RNAentropy can compute structural entropy for any user-specified temperature, and supports both the Turner'99 and Turner'04 energy parameters. It follows that RNAentropy is state-of-the-art software to compute RNA secondary structure conformational entropy. Source code is available at https://github.com/clotelab/RNAentropy/; a full web server is available at http://bioinformatics.bc.edu/clotelab/RNAentropy, including source code and ancillary programs.


Assuntos
Algoritmos , Entropia , Conformação de Ácido Nucleico , RNA/química , Sequência de Bases , Alinhamento de Sequência , Software , Temperatura
9.
PLoS One ; 10(10): e0139476, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26488894

RESUMO

We describe the first dynamic programming algorithm that computes the expected degree for the network, or graph G = (V, E) of all secondary structures of a given RNA sequence a = a1, …, an. Here, the nodes V correspond to all secondary structures of a, while an edge exists between nodes s, t if the secondary structure t can be obtained from s by adding, removing or shifting a base pair. Since secondary structure kinetics programs implement the Gillespie algorithm, which simulates a random walk on the network of secondary structures, the expected network degree may provide a better understanding of kinetics of RNA folding when allowing defect diffusion, helix zippering, and related conformation transformations. We determine the correlation between expected network degree, contact order, conformational entropy, and expected number of native contacts for a benchmarking dataset of RNAs. Source code is available at http://bioinformatics.bc.edu/clotelab/RNAexpNumNbors.


Assuntos
Algoritmos , Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Pareamento de Bases , Sequência de Bases , Cinética , Modelos Moleculares , Dados de Sequência Molecular , RNA/genética , Dobramento de RNA
10.
Nucleic Acids Res ; 43(W1): W513-21, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-26019176

RESUMO

UNLABELLED: Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. AVAILABILITY: the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0.


Assuntos
Dobramento de RNA , RNA/química , Software , Algoritmos , Internet , Conformação de Ácido Nucleico , Análise de Sequência de Proteína , Análise de Sequência de RNA
11.
J Comput Biol ; 22(2): 124-44, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25684201

RESUMO

In this article, we introduce the software suite Hermes, which provides fast, novel algorithms for RNA secondary structure kinetics. Using the fast Fourier transform to efficiently compute the Boltzmann probability that a secondary structure S of a given RNA sequence has base pair distance x (resp. y) from reference structure A (resp. B), Hermes computes the exact kinetics of folding from A to B in this coarse-grained model. In particular, Hermes computes the mean first passage time from the transition probability matrix by using matrix inversion, and also computes the equilibrium time from the rate matrix by using spectral decomposition. Due to the model granularity and the speed of Hermes, it is capable of determining secondary structure refolding kinetics for large RNA sequences, beyond the range of other methods. Comparative benchmarking of Hermes with other methods indicates that Hermes provides refolding kinetics of accuracy suitable for use in the computational design of RNA, an important area of synthetic biology. Source code and documentation for Hermes are available.


Assuntos
Dobramento de RNA , Software , Cinética
12.
J Comput Chem ; 36(2): 103-17, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25382310

RESUMO

Consider the network of all secondary structures of a given RNA sequence, where nodes are connected when the corresponding structures have base pair distance one. The expected degree of the network is the average number of neighbors, where average may be computed with respect to the either the uniform or Boltzmann probability. Here, we describe the first algorithm, RNAexpNumNbors, that can compute the expected number of neighbors, or expected network degree, of an input sequence. For RNA sequences from the Rfam database, the expected degree is significantly less than the constrained minimum free energy structure, defined to have minimum free energy (MFE) over all structures consistent with the Rfam consensus structure. The expected degree of structural RNAs, such as purine riboswitches, paradoxically appears to be smaller than that of random RNA, yet the difference between the degree of the MFE structure and the expected degree is larger than that of random RNA. Expected degree does not seem to correlate with standard structural diversity measures of RNA, such as positional entropy and ensemble defect. The program RNAexpNumNbors is written in C, runs in cubic time and quadratic space, and is publicly available at http://bioinformatics.bc.edu/clotelab/RNAexpNumNbors.


Assuntos
Algoritmos , RNA/química , Software , Sequência de Bases , Bases de Dados Factuais , Conformação de Ácido Nucleico , Termodinâmica
13.
J Math Biol ; 70(1-2): 173-96, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24515409

RESUMO

RNA folding pathways play an important role in various biological processes, such as (i) the hok/sok (host-killing/suppression of killing) system in E. coli to check for sufficient plasmid copy number, (ii) the conformational switch in spliced leader (SL) RNA from Leptomonas collosoma, which controls trans splicing of a portion of the '5 exon, and (iii) riboswitches--portions of the 5' untranslated region of messenger RNA that regulate genes by allostery. Since RNA folding pathways are determined by the energy landscape, we describe a novel algorithm, FFTbor2D, which computes the 2D projection of the energy landscape for a given RNA sequence. Given two metastable secondary structures A, B for a given RNA sequence, FFTbor2D computes the Boltzmann probability p(x, y) = Z(x,y)/Z that a secondary structure has base pair distance x from A and distance y from B. Using polynomial interpolationwith the fast Fourier transform,we compute p(x, y) in O(n(5)) time and O(n(2)) space, which is an improvement over an earlier method, which runs in O(n(7)) time and O(n(4)) space. FFTbor2D has potential applications in synthetic biology, where one might wish to design bistable switches having target metastable structures A, B with favorable pathway kinetics. By inverting the transition probability matrix determined from FFTbor2D output, we show that L. collosoma spliced leader RNA has larger mean first passage time from A to B on the 2D energy landscape, than 97.145% of 20,000 sequences, each having metastable structures A, B. Source code and binaries are freely available for download at http://bioinformatics.bc.edu/clotelab/FFTbor2D. The program FFTbor2D is implemented in C++, with optional OpenMP parallelization primitives.


Assuntos
Modelos Moleculares , Conformação de Ácido Nucleico , RNA de Protozoário/química , Regiões 5' não Traduzidas , Algoritmos , Animais , Análise de Fourier , Cinética , Conceitos Matemáticos , Simulação de Dinâmica Molecular , Splicing de RNA , RNA de Protozoário/genética , RNA de Protozoário/metabolismo , RNA Líder para Processamento/química , RNA Líder para Processamento/genética , RNA Líder para Processamento/metabolismo , Trypanosomatina/química , Trypanosomatina/genética , Trypanosomatina/metabolismo
14.
Nucleic Acids Res ; 42(18): 11752-62, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25209235

RESUMO

Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold, which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold, we design ten cis-cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis-cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/.


Assuntos
RNA Catalítico/química , Algoritmos , Sequência de Bases , Biologia Computacional/métodos , Sequência Consenso , Clivagem do RNA , Dobramento de RNA , RNA Catalítico/metabolismo , Biologia Sintética/métodos
15.
PLoS One ; 9(2): e85412, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24586240

RESUMO

We describe the first algorithm and software, RNAenn, to compute the partition function and minimum free energy secondary structure for RNA with respect to an extended nearest neighbor energy model. Our next-nearest-neighbor triplet energy model appears to lead to somewhat more cooperative folding than does the nearest neighbor energy model, as judged by melting curves computed with RNAenn and with two popular software implementations for the nearest-neighbor energy model. A web server is available at http://bioinformatics.bc.edu/clotelab/RNAenn/.


Assuntos
Algoritmos , Metabolismo Energético/fisiologia , Modelos Químicos , Conformação de Ácido Nucleico , RNA/química , Estrutura Molecular
16.
J Comput Biol ; 21(3): 201-18, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24559086

RESUMO

We describe four novel algorithms, RNAhairpin, RNAmloopNum, RNAmloopOrder, and RNAmloopHP, which compute the Boltzmann partition function for global structural constraints-respectively for the number of hairpins, the number of multiloops, maximum order (or depth) of multiloops, and the simultaneous number of hairpins and multiloops. Given an RNA sequence of length n and a user-specified integer 0 ≤ K ≤ n, RNAhairpin (resp. RNAmloopNum and RNAmloopOrder) computes the partition functions Z(k) for each 0 ≤ k ≤ K in time O(K(2)n(3)) and space O(Kn(2)), while RNAmloopHP computes the partition functions Z(m, h) for 0 ≤ mm ≤ M multiloops and 0 ≤ h ≤ H hairpins, with run time O(M(2)H(2)n(3)) and space O(MHn(2)). In addition, programs such as RNAhairpin (resp. RNAmloopHP) sample from the low-energy ensemble of structures having h hairpins (resp. m multiloops and h hairpins), for given h, m. Moreover, by using the fast Fourier transform (FFT), RNAhairpin and RNAmloopNum have been improved to run in time O(n(4)) and space O(n(2)), although this improvement is not possible for RNAmloopOrder. We present two applications of the novel algorithms. First, we show that for many Rfam families of RNA, structures sampled from RNAmloopHP are more accurate than the minimum free-energy structure; for instance, sensitivity improves by almost 24% for transfer RNA, while for certain ribozyme families, there is an improvement of around 5%. Second, we show that the probabilities p(k)=Z(k)/Z of forming k hairpins (resp. multiloops) provide discriminating novel features for a support vector machine or relevance vector machine binary classifier for Rfam families of RNA. Our data suggests that multiloop order does not provide any significant discriminatory power over that of hairpin and multiloop number, and since these probabilities can be efficiently computed using the FFT, hairpin and multiloop formation probabilities could be added to other features in existent noncoding RNA gene finders. Our programs, written in C/C++, are publicly available online at: http://bioinformatics.bc.edu/clotelab/RNAparametric .


Assuntos
Biologia Computacional , Sequências Repetidas Invertidas/genética , Conformação de Ácido Nucleico , RNA/química , Algoritmos , Análise de Fourier , Probabilidade , RNA/genética , Análise de Sequência de RNA , Termodinâmica
17.
J Math Biol ; 68(1-2): 341-75, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23263300

RESUMO

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366∙n-3/2∙2.618034n. Motivated by the kinetics of RNA secondary structure formation, we are interested in determining the asymptotic number of secondary structures that are locally optimal, with respect to a particular energy model. In the Nussinov energy model, where each base pair contributes -1 towards the energy of the structure, locally optimal structures are exactly the saturated structures, for which we have previously shown that asymptotically, there are 1.07427∙n-3/2∙2.35467n many saturated structures for a sequence of length n. In this paper, we consider the base stacking energy model, a mild variant of the Nussinov model, where each stacked base pair contributes -1 toward the energy of the structure. Locally optimal structures with respect to the base stacking energy model are exactly those secondary structures, whose stems cannot be extended. Such structures were first considered by Evers and Giegerich, who described a dynamic programming algorithm to enumerate all locally optimal structures. In this paper, we apply methods from enumerative combinatorics to compute the asymptotic number of such structures. Additionally, we consider analogous combinatorial problems for secondary structures with annotated single-stranded, stacking nucleotides (dangles).


Assuntos
Modelos Teóricos , Conformação de Ácido Nucleico , RNA/química , Termodinâmica
18.
RNA Biol ; 10(12): 1842-52, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24253111

RESUMO

Internal ribosome entry site (IRES) elements govern protein synthesis of mRNAs that bypass cap-dependent translation inhibition under stress conditions. Picornavirus IRES are cis-acting elements, organized in modular domains that recruit the ribosome to internal mRNA sites. The aim of this study was to retrieve short RNA sequences with the capacity to adopt RNA folding patterns conserved with IRES structural subdomains, likely corresponding to RNA modules. We have applied a new program, RNAiFold, an inverse folding algorithm that determines all sequences whose minimum free energy structure is identical to that of the structural domains of interest. Sequences differing by more than 1 nt were clustered. Then, BLASTing one randomly chosen sequence from each cluster of the RNAiFold output, we retrieved viral and cellular sequences among output hits. As a proof of principle, we present the data corresponding to a coding region of Drosophila melanogaster TAF6, a transcription factor-associated protein that contains a structural motif within its coding region potentially folding into an IRES-like subdomain. This RNA region shows a biased codon usage, as predicted from structural constraints at the RNA level, it harbors conserved IRES structural motifs in loops, and interestingly, it has the capacity to confer internal initiation of translation in tissue culture cells.


Assuntos
Algoritmos , RNA Mensageiro/metabolismo , Sequências Reguladoras de Ácido Ribonucleico , Ribossomos/metabolismo , Animais , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Regulação da Expressão Gênica , Modelos Moleculares , Conformação de Ácido Nucleico , Picornaviridae/genética , Dobramento de RNA , Reprodutibilidade dos Testes , Fatores Associados à Proteína de Ligação a TATA/genética , Fator de Transcrição TFIID/genética
19.
Algorithms Mol Biol ; 8(1): 24, 2013 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-24156624

RESUMO

BACKGROUND: RNA folding depends on the distribution of kinetic traps in the landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In previous work, we investigated asymptotic combinatorics of both random saturated structures and of quasi-random saturated structures, where the latter are constructed by a natural stochastic process. RESULTS: We prove that for quasi-random saturated structures with the uniform distribution, the asymptotic expected number of external loops is O(logn) and the asymptotic expected maximum stem length is O(logn), while under the Zipf distribution, the asymptotic expected number of external loops is O(log2n) and the asymptotic expected maximum stem length is O(logn/log logn). CONCLUSIONS: Quasi-random saturated structures are generated by a stochastic greedy method, which is simple to implement. Structural features of random saturated structures appear to resemble those of quasi-random saturated structures, and the latter appear to constitute a class for which both the generation of sampled structures as well as a combinatorial investigation of structural features may be simpler to undertake.

20.
Bull Math Biol ; 75(12): 2410-30, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24142625

RESUMO

In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685[Symbol: see text]n (-3/2)[Symbol: see text]1.62178 (n) , and the asymptotic expected number of hairpins follows a normal distribution with mean [Formula: see text]. Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194[Symbol: see text]n, a value greater than the asymptotic expected number 0.105573[Symbol: see text]n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,…,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica(™) computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/ .


Assuntos
Conformação de Ácido Nucleico , RNA/química , RNA/genética , Algoritmos , Biologia Computacional , Sequências Repetidas Invertidas , Conceitos Matemáticos , Modelos Moleculares
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA