Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Math Biol ; 76(5): 1195-1227, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-28780735

RESUMO

RNA secondary structure folding kinetics is known to be important for the biological function of certain processes, such as the hok/sok system in E. coli. Although linear algebra provides an exact computational solution of secondary structure folding kinetics with respect to the Turner energy model for tiny ([Formula: see text]20 nt) RNA sequences, the folding kinetics for larger sequences can only be approximated by binning structures into macrostates in a coarse-grained model, or by repeatedly simulating secondary structure folding with either the Monte Carlo algorithm or the Gillespie algorithm. Here we investigate the relation between the Monte Carlo algorithm and the Gillespie algorithm. We prove that asymptotically, the expected time for a K-step trajectory of the Monte Carlo algorithm is equal to [Formula: see text] times that of the Gillespie algorithm, where [Formula: see text] denotes the Boltzmann expected network degree. If the network is regular (i.e. every node has the same degree), then the mean first passage time (MFPT) computed by the Monte Carlo algorithm is equal to MFPT computed by the Gillespie algorithm multiplied by [Formula: see text]; however, this is not true for non-regular networks. In particular, RNA secondary structure folding kinetics, as computed by the Monte Carlo algorithm, is not equal to the folding kinetics, as computed by the Gillespie algorithm, although the mean first passage times are roughly correlated. Simulation software for RNA secondary structure folding according to the Monte Carlo and Gillespie algorithms is publicly available, as is our software to compute the expected degree of the network of secondary structures of a given RNA sequence-see http://bioinformatics.bc.edu/clote/RNAexpNumNbors .


Assuntos
Algoritmos , Modelos Moleculares , Dobramento de RNA , Sequência de Bases , Cinética , Cadeias de Markov , Conceitos Matemáticos , Método de Monte Carlo , RNA/química
2.
Bioinformatics ; 32(12): i360-i368, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307638

RESUMO

MOTIVATION: RNA thermometers (RNATs) are cis-regulatory elements that change secondary structure upon temperature shift. Often involved in the regulation of heat shock, cold shock and virulence genes, RNATs constitute an interesting potential resource in synthetic biology, where engineered RNATs could prove to be useful tools in biosensors and conditional gene regulation. RESULTS: Solving the 2-temperature inverse folding problem is critical for RNAT engineering. Here we introduce RNAiFold2T, the first Constraint Programming (CP) and Large Neighborhood Search (LNS) algorithms to solve this problem. Benchmarking tests of RNAiFold2T against existent programs (adaptive walk and genetic algorithm) inverse folding show that our software generates two orders of magnitude more solutions, thus allowing ample exploration of the space of solutions. Subsequently, solutions can be prioritized by computing various measures, including probability of target structure in the ensemble, melting temperature, etc. Using this strategy, we rationally designed two thermosensor internal ribosome entry site (thermo-IRES) elements, whose normalized cap-independent translation efficiency is approximately 50% greater at 42 °C than 30 °C, when tested in reticulocyte lysates. Translation efficiency is lower than that of the wild-type IRES element, which on the other hand is fully resistant to temperature shift-up. This appears to be the first purely computational design of functional RNA thermoswitches, and certainly the first purely computational design of functional thermo-IRES elements. AVAILABILITY: RNAiFold2T is publicly available as part of the new release RNAiFold3.0 at https://github.com/clotelab/RNAiFold and http://bioinformatics.bc.edu/clotelab/RNAiFold, which latter has a web server as well. The software is written in C ++ and uses OR-Tools CP search engine. CONTACT: clote@bc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Dobramento de RNA , Algoritmos , Sequência de Bases , Sítios Internos de Entrada Ribossomal , Conformação de Ácido Nucleico , RNA , Software
3.
Nucleic Acids Res ; 43(W1): W513-21, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-26019176

RESUMO

UNLABELLED: Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. AVAILABILITY: the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0.


Assuntos
Dobramento de RNA , RNA/química , Software , Algoritmos , Internet , Conformação de Ácido Nucleico , Análise de Sequência de Proteína , Análise de Sequência de RNA
4.
BMC Bioinformatics ; 17(1): 530, 2016 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-27964762

RESUMO

BACKGROUND: Retroviruses transcribe messenger RNA for the overlapping Gag and Gag-Pol polyproteins, by using a programmed -1 ribosomal frameshift which requires a slippery sequence and an immediate downstream stem-loop secondary structure, together called frameshift stimulating signal (FSS). It follows that the molecular evolution of this genomic region of HIV-1 is highly constrained, since the retroviral genome must contain a slippery sequence (sequence constraint), code appropriate peptides in reading frames 0 and 1 (coding requirements), and form a thermodynamically stable stem-loop secondary structure (structure requirement). RESULTS: We describe a unique computational tool, RNAsampleCDS, designed to compute the number of RNA sequences that code two (or more) peptides p,q in overlapping reading frames, that are identical (or have BLOSUM/PAM similarity that exceeds a user-specified value) to the input peptides p,q. RNAsampleCDS then samples a user-specified number of messenger RNAs that code such peptides; alternatively, RNAsampleCDS can exactly compute the position-specific scoring matrix and codon usage bias for all such RNA sequences. Our software allows the user to stipulate overlapping coding requirements for all 6 possible reading frames simultaneously, even allowing IUPAC constraints on RNA sequences and fixing GC-content. We generalize the notion of codon preference index (CPI) to overlapping reading frames, and use RNAsampleCDS to generate control sequences required in the computation of CPI. Moreover, by applying RNAsampleCDS, we are able to quantify the extent to which the overlapping coding requirement in HIV-1 [resp. HCV] contribute to the formation of the stem-loop [resp. double stem-loop] secondary structure known as the frameshift stimulating signal. Using our software, we confirm that certain experimentally determined deleterious HCV mutations occur in positions for which our software RNAsampleCDS and RNAiFold both indicate a single possible nucleotide. We generalize the notion of codon preference index (CPI) to overlapping coding regions, and use RNAsampleCDS to generate control sequences required in the computation of CPI for the Gag-Pol overlapping coding region of HIV-1. These applications show that RNAsampleCDS constitutes a unique tool in the software arsenal now available to evolutionary biologists. CONCLUSION: Source code for the programs and additional data are available at http://bioinformatics.bc.edu/clotelab/RNAsampleCDS/ .


Assuntos
Códon/genética , Biologia Computacional/métodos , HIV-1/genética , Fases de Leitura Aberta , RNA Viral/genética , Sequência de Bases , Códon/metabolismo , Biologia Computacional/instrumentação , Infecções por HIV/virologia , HIV-1/química , Humanos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Matrizes de Pontuação de Posição Específica , RNA Viral/química , Fases de Leitura , Software
5.
BMC Bioinformatics ; 17(1): 424, 2016 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-27756204

RESUMO

BACKGROUND: RNA inverse folding is the problem of finding one or more sequences that fold into a user-specified target structure s 0, i.e. whose minimum free energy secondary structure is identical to the target s 0. Here we consider the ensemble of all RNA sequences that have low free energy with respect to a given target s 0. RESULTS: We introduce the program RNAdualPF, which computes the dual partition function Z ∗, defined as the sum of Boltzmann factors exp(-E(a,s 0)/RT) of all RNA nucleotide sequences a compatible with target structure s 0. Using RNAdualPF, we efficiently sample RNA sequences that approximately fold into s 0, where additionally the user can specify IUPAC sequence constraints at certain positions, and whether to include dangles (energy terms for stacked, single-stranded nucleotides). Moreover, since we also compute the dual partition function Z ∗(k) over all sequences having GC-content k, the user can require that all sampled sequences have a precise, specified GC-content. Using Z ∗, we compute the dual expected energy 〈E ∗〉, and use it to show that natural RNAs from the Rfam 12.0 database have higher minimum free energy than expected, thus suggesting that functional RNAs are under evolutionary pressure to be only marginally thermodynamically stable. We show that C. elegans precursor microRNA (pre-miRNA) is significantly non-robust with respect to mutations, by comparing the robustness of each wild type pre-miRNA sequence with 2000 [resp. 500] sequences of the same GC-content generated by RNAdualPF, which approximately [resp. exactly] fold into the wild type target structure. We confirm and strengthen earlier findings that precursor microRNAs and bacterial small noncoding RNAs display plasticity, a measure of structural diversity. CONCLUSION: We describe RNAdualPF, which rapidly computes the dual partition function Z ∗ and samples sequences having low energy with respect to a target structure, allowing sequence constraints and specified GC-content. Using different inverse folding software, another group had earlier shown that pre-miRNA is mutationally robust, even controlling for compositional bias. Our opposite conclusion suggests a cautionary note that computationally based insights into molecular evolution may heavily depend on the software used. C/C++-software for RNAdualPF is available at http://bioinformatics.bc.edu/clotelab/RNAdualPF .


Assuntos
Caenorhabditis elegans/genética , Biologia Computacional/métodos , Escherichia coli/genética , Evolução Molecular , MicroRNAs/genética , RNA Nuclear Pequeno/genética , Software , Algoritmos , Animais , Bases de Dados Factuais , RNA/química , Dobramento de RNA , Análise de Sequência de RNA/métodos
6.
Nucleic Acids Res ; 42(18): 11752-62, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25209235

RESUMO

Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold, which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold, we design ten cis-cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis-cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/.


Assuntos
RNA Catalítico/química , Algoritmos , Sequência de Bases , Biologia Computacional/métodos , Sequência Consenso , Clivagem do RNA , Dobramento de RNA , RNA Catalítico/metabolismo , Biologia Sintética/métodos
7.
J Comput Chem ; 36(2): 103-17, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25382310

RESUMO

Consider the network of all secondary structures of a given RNA sequence, where nodes are connected when the corresponding structures have base pair distance one. The expected degree of the network is the average number of neighbors, where average may be computed with respect to the either the uniform or Boltzmann probability. Here, we describe the first algorithm, RNAexpNumNbors, that can compute the expected number of neighbors, or expected network degree, of an input sequence. For RNA sequences from the Rfam database, the expected degree is significantly less than the constrained minimum free energy structure, defined to have minimum free energy (MFE) over all structures consistent with the Rfam consensus structure. The expected degree of structural RNAs, such as purine riboswitches, paradoxically appears to be smaller than that of random RNA, yet the difference between the degree of the MFE structure and the expected degree is larger than that of random RNA. Expected degree does not seem to correlate with standard structural diversity measures of RNA, such as positional entropy and ensemble defect. The program RNAexpNumNbors is written in C, runs in cubic time and quadratic space, and is publicly available at http://bioinformatics.bc.edu/clotelab/RNAexpNumNbors.


Assuntos
Algoritmos , RNA/química , Software , Sequência de Bases , Bases de Dados Factuais , Conformação de Ácido Nucleico , Termodinâmica
8.
J Math Biol ; 70(1-2): 173-96, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24515409

RESUMO

RNA folding pathways play an important role in various biological processes, such as (i) the hok/sok (host-killing/suppression of killing) system in E. coli to check for sufficient plasmid copy number, (ii) the conformational switch in spliced leader (SL) RNA from Leptomonas collosoma, which controls trans splicing of a portion of the '5 exon, and (iii) riboswitches--portions of the 5' untranslated region of messenger RNA that regulate genes by allostery. Since RNA folding pathways are determined by the energy landscape, we describe a novel algorithm, FFTbor2D, which computes the 2D projection of the energy landscape for a given RNA sequence. Given two metastable secondary structures A, B for a given RNA sequence, FFTbor2D computes the Boltzmann probability p(x, y) = Z(x,y)/Z that a secondary structure has base pair distance x from A and distance y from B. Using polynomial interpolationwith the fast Fourier transform,we compute p(x, y) in O(n(5)) time and O(n(2)) space, which is an improvement over an earlier method, which runs in O(n(7)) time and O(n(4)) space. FFTbor2D has potential applications in synthetic biology, where one might wish to design bistable switches having target metastable structures A, B with favorable pathway kinetics. By inverting the transition probability matrix determined from FFTbor2D output, we show that L. collosoma spliced leader RNA has larger mean first passage time from A to B on the 2D energy landscape, than 97.145% of 20,000 sequences, each having metastable structures A, B. Source code and binaries are freely available for download at http://bioinformatics.bc.edu/clotelab/FFTbor2D. The program FFTbor2D is implemented in C++, with optional OpenMP parallelization primitives.


Assuntos
Modelos Moleculares , Conformação de Ácido Nucleico , RNA de Protozoário/química , Regiões 5' não Traduzidas , Algoritmos , Animais , Análise de Fourier , Cinética , Conceitos Matemáticos , Simulação de Dinâmica Molecular , Splicing de RNA , RNA de Protozoário/genética , RNA de Protozoário/metabolismo , RNA Líder para Processamento/química , RNA Líder para Processamento/genética , RNA Líder para Processamento/metabolismo , Trypanosomatina/química , Trypanosomatina/genética , Trypanosomatina/metabolismo
9.
Nucleic Acids Res ; 41(Web Server issue): W465-70, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23700314

RESUMO

Synthetic biology and nanotechnology are poised to make revolutionary contributions to the 21st century. In this article, we describe a new web server to support in silico RNA molecular design. Given an input target RNA secondary structure, together with optional constraints, such as requiring GC-content to lie within a certain range, requiring the number of strong (GC), weak (AU) and wobble (GU) base pairs to lie in a certain range, the RNAiFold web server determines one or more RNA sequences, whose minimum free-energy secondary structure is the target structure. RNAiFold provides access to two servers: RNA-CPdesign, which applies constraint programming, and RNA-LNSdesign, which applies the large neighborhood search heuristic; hence, it is suitable for larger input structures. Both servers can also solve the RNA inverse hybridization problem, i.e. given a representation of the desired hybridization structure, RNAiFold returns two sequences, whose minimum free-energy hybridization is the input target structure. The web server is publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold, which provides access to two specialized servers: RNA-CPdesign and RNA-LNSdesign. Source code for the underlying algorithms, implemented in COMET and supported on linux, can be downloaded at the server website.


Assuntos
Dobramento de RNA , Software , Algoritmos , Composição de Bases , Pareamento de Bases , Sequência de Bases , Simulação por Computador , Internet , RNA/química
10.
J Math Biol ; 68(1-2): 341-75, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23263300

RESUMO

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366∙n-3/2∙2.618034n. Motivated by the kinetics of RNA secondary structure formation, we are interested in determining the asymptotic number of secondary structures that are locally optimal, with respect to a particular energy model. In the Nussinov energy model, where each base pair contributes -1 towards the energy of the structure, locally optimal structures are exactly the saturated structures, for which we have previously shown that asymptotically, there are 1.07427∙n-3/2∙2.35467n many saturated structures for a sequence of length n. In this paper, we consider the base stacking energy model, a mild variant of the Nussinov model, where each stacked base pair contributes -1 toward the energy of the structure. Locally optimal structures with respect to the base stacking energy model are exactly those secondary structures, whose stems cannot be extended. Such structures were first considered by Evers and Giegerich, who described a dynamic programming algorithm to enumerate all locally optimal structures. In this paper, we apply methods from enumerative combinatorics to compute the asymptotic number of such structures. Additionally, we consider analogous combinatorial problems for secondary structures with annotated single-stranded, stacking nucleotides (dangles).


Assuntos
Modelos Teóricos , Conformação de Ácido Nucleico , RNA/química , Termodinâmica
11.
RNA Biol ; 10(12): 1842-52, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24253111

RESUMO

Internal ribosome entry site (IRES) elements govern protein synthesis of mRNAs that bypass cap-dependent translation inhibition under stress conditions. Picornavirus IRES are cis-acting elements, organized in modular domains that recruit the ribosome to internal mRNA sites. The aim of this study was to retrieve short RNA sequences with the capacity to adopt RNA folding patterns conserved with IRES structural subdomains, likely corresponding to RNA modules. We have applied a new program, RNAiFold, an inverse folding algorithm that determines all sequences whose minimum free energy structure is identical to that of the structural domains of interest. Sequences differing by more than 1 nt were clustered. Then, BLASTing one randomly chosen sequence from each cluster of the RNAiFold output, we retrieved viral and cellular sequences among output hits. As a proof of principle, we present the data corresponding to a coding region of Drosophila melanogaster TAF6, a transcription factor-associated protein that contains a structural motif within its coding region potentially folding into an IRES-like subdomain. This RNA region shows a biased codon usage, as predicted from structural constraints at the RNA level, it harbors conserved IRES structural motifs in loops, and interestingly, it has the capacity to confer internal initiation of translation in tissue culture cells.


Assuntos
Algoritmos , RNA Mensageiro/metabolismo , Sequências Reguladoras de Ácido Ribonucleico , Ribossomos/metabolismo , Animais , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Regulação da Expressão Gênica , Modelos Moleculares , Conformação de Ácido Nucleico , Picornaviridae/genética , Dobramento de RNA , Reprodutibilidade dos Testes , Fatores Associados à Proteína de Ligação a TATA/genética , Fator de Transcrição TFIID/genética
12.
Bull Math Biol ; 75(12): 2410-30, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24142625

RESUMO

In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685[Symbol: see text]n (-3/2)[Symbol: see text]1.62178 (n) , and the asymptotic expected number of hairpins follows a normal distribution with mean [Formula: see text]. Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194[Symbol: see text]n, a value greater than the asymptotic expected number 0.105573[Symbol: see text]n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,…,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica(™) computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/ .


Assuntos
Conformação de Ácido Nucleico , RNA/química , RNA/genética , Algoritmos , Biologia Computacional , Sequências Repetidas Invertidas , Conceitos Matemáticos , Modelos Moleculares
13.
BMC Bioinformatics ; 13 Suppl 5: S6, 2012 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-22537010

RESUMO

BACKGROUND: Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detect general (i.e., not family specific) entire riboswitches (both aptamer and expression platform) with accuracy. Thus, the development of additional algorithms to detect conformational switches seems important, especially since the difference in free energy between the two metastable secondary structures may be as large as 15-20 kcal/mol. It has recently emerged that RNA secondary structure can be more accurately predicted by computing the maximum expected accuracy (MEA) structure, rather than the minimum free energy (MFE) structure. RESULTS: Given an arbitrary RNA secondary structure S0 for an RNA nucleotide sequence a = a1,..., a(n), we say that another secondary structure S of a is a k-neighbor of S0, if the base pair distance between S0 and S is k. In this paper, we prove that the Boltzmann probability of all k-neighbors of the minimum free energy structure S0 can be approximated with accuracy ε and confidence 1 - p, simultaneously for all 0 ≤ k < K, by a relative frequency count over N sampled structures, provided that N>N(ε,p,K)=Φ⁻¹(p/2K)²/4ε², where Φ(z) is the cumulative distribution function (CDF) for the standard normal distribution. We go on to describe the algorithm RNAborMEA, which for an arbitrary initial structure S0 and for all values 0 ≤ k < K, computes the secondary structure MEA(k), having maximum expected accuracy over all k-neighbors of S0. Computation time is O(n³ · K²), and memory requirements are O(n² · K). We analyze a sample TPP riboswitch, and apply our algorithm to the class of purine riboswitches. CONCLUSIONS: The approximation of RNAbor by sampling, with rigorous bound on accuracy, together with the computation of maximum expected accuracy k-neighbors by RNAborMEA, provide additional tools toward conformational switch detection. Results from RNAborMEA are quite distinct from other tools, such as RNAbor, RNAshapes and paRNAss, hence may provide orthogonal information when looking for suboptimal structures or conformational switches. Source code for RNAborMEA can be downloaded from http://sourceforge.net/projects/rnabormea/ or http://bioinformatics.bc.edu/clotelab/RNAborMEA/.


Assuntos
Algoritmos , Conformação de Ácido Nucleico , RNA/química , Sequência de Bases , Dados de Sequência Molecular , Análise de Sequência de RNA/métodos , Termodinâmica
14.
J Math Biol ; 65(3): 581-99, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21984358

RESUMO

In "The ends of a large RNA molecule are necessarily close", Yoffe et al. (Nucleic Acids Res 39(1):292-299, 2011) used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between 5' and 3' ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5'-3' distance is defined to be the length of the shortest path from 5' node to 3' node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5'-3' distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5' to 3' ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5' to 3' ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5' and 3' ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5'-3' distance of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5'-3' distance for secondary structures from the STRAND database, and conclude that the 5'-3' distance is correlated with RNA sequence length.


Assuntos
Modelos Químicos , Conformação de Ácido Nucleico , Nucleotídeos/química , RNA/química , Algoritmos , Modelos Moleculares
15.
J Math Biol ; 65(6-7): 1337-57, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22159642

RESUMO

Let S denote the set of (possibly noncanonical) base pairs {i, j } of an RNA tertiary structure; i.e. {i, j} ∈ S if there is a hydrogen bond between the ith and jth nucleotide. The page number of S, denoted π(S), is the minimum number k such that Scan be decomposed into a disjoint union of k secondary structures. Here, we show that computing the page number is NP-complete; we describe an exact computation of page number, using constraint programming, and determine the page number of a collection of RNA tertiary structures, for which the topological genus is known. We describe an approximation algorithm from which it follows that ω(S) ≤ π(S) ≤ ω(S) ・log n,where the clique number of S, ω(S), denotes the maximum number of base pairs that pairwise cross each other.


Assuntos
Pareamento de Bases , Modelos Químicos , Conformação de Ácido Nucleico , RNA/química , Ligação de Hidrogênio , Modelos Genéticos , Modelos Moleculares , Termodinâmica
16.
Nucleic Acids Res ; 38(5): 1711-22, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20044352

RESUMO

Given an RNA sequence and two designated secondary structures A, B, we describe a new algorithm that computes a nearly optimal folding pathway from A to B. The algorithm, RNAtabupath, employs a tabu semi-greedy heuristic, known to be an effective search strategy in combinatorial optimization. Folding pathways, sometimes called routes or trajectories, are computed by RNAtabupath in a fraction of the time required by the barriers program of Vienna RNA Package. We benchmark RNAtabupath with other algorithms to compute low energy folding pathways between experimentally known structures of several conformational switches. The RNApathfinder web server, source code for algorithms to compute and analyze pathways and supplementary data are available at http://bioinformatics.bc.edu/clotelab/RNApathfinder.


Assuntos
Algoritmos , RNA/química , Software , Biologia Computacional/métodos , Conformação de Ácido Nucleico , Análise de Sequência de RNA
17.
Proteomics ; 11(1): 22-32, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21182191

RESUMO

One of the most common approaches for large-scale protein identification is LC, followed by MS. If more than a few proteins are to be identified, the additional fragmentation of individual peptides has so far been considered as indispensable, and thus, the associated costs, in terms of instrument time and infrastructure, as unavoidable. Here, we present evidence to the contrary. Using a combination of (i) highly accurate and precise mass measurements, (ii) modern retention time prediction, and (iii) a robust scoring algorithm, we were able to identify 257 proteins of Francisella tularensis from a single LC-MS experiment in a fragmentation-free approach (i.e. without experimental fragmentation spectra). This number amounts to 59% of the number of proteins identified in a standard fragmentation-based approach, when executed with the same false discovery rate. Independent evidence supports at least 27 of a set of 31 proteins that were identified only in the fragmentation-free approach. Our results suggest that additional developments in retention time prediction, measurement technology, and scoring algorithms may render fragmentation-free approaches an interesting complement or an alternative to fragmentation-based approaches.


Assuntos
Cromatografia Líquida de Alta Pressão , Biologia Computacional , Espectrometria de Massas em Tandem/métodos , Francisella tularensis/metabolismo , Peptídeos/metabolismo , Proteínas/metabolismo
18.
Bioinformatics ; 26(12): i278-86, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529917

RESUMO

MOTIVATION: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures. RESULTS: A non-Boltzmannian Monte Carlo algorithm was designed by Wang and Landau to estimate the density of states for complex systems, such as the Ising model, that exhibit a phase transition. In this article, we apply the Wang-Landau (WL) method to compute the density of states for secondary structures of a given RNA sequence, and for hybridizations of two RNA sequences. Our method is shown to be much faster than existent software, such as RNAsubopt. From density of states, we compute the partition function over all secondary structures and over all pseudoknot-free hybridizations. The advantage of the WL method is that by adding a function to evaluate the free energy of arbitrary pseudoknotted structures and of arbitrary hybridizations, we can estimate thermodynamic parameters for situations known to be NP-complete. This extension to pseudoknots will be made in the sequel to this article; in contrast, the current article describes the WL algorithm applied to pseudoknot-free secondary structures and hybridizations. AVAILABILITY: The WL RNA hybridization web server is under construction at http://bioinformatics.bc.edu/clotelab/.


Assuntos
Algoritmos , RNA/química , Biologia Computacional/métodos , Método de Monte Carlo , Conformação de Ácido Nucleico , Termodinâmica
19.
Nucleic Acids Res ; 37(Web Server issue): W281-6, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19531740

RESUMO

The history and mechanism of molecular evolution in DNA have been greatly elucidated by contributions from genetics, probability theory and bioinformatics--indeed, mathematical developments such as Kimura's neutral theory, Kingman's coalescent theory and efficient software such as BLAST, ClustalW, Phylip, etc., provide the foundation for modern population genetics. In contrast to DNA, the function of most noncoding RNA depends on tertiary structure, experimentally known to be largely determined by secondary structure, for which dynamic programming can efficiently compute the minimum free energy secondary structure. For this reason, understanding the effect of pointwise mutations in RNA secondary structure could reveal fundamental properties of structural RNA molecules and improve our understanding of molecular evolution of RNA. The web server RNAmutants provides several efficient tools to compute the ensemble of low-energy secondary structures for all k-mutants of a given RNA sequence, where k is bounded by a user-specified upper bound. As we have previously shown, these tools can be used to predict putative deleterious mutations and to analyze regulatory sequences from the hepatitis C and human immunodeficiency genomes. Web server is available at http://bioinformatics.bc.edu/clotelab/RNAmutants/, and downloadable binaries at http://rnamutants.csail.mit.edu/.


Assuntos
Mutação Puntual , RNA não Traduzido/química , RNA não Traduzido/genética , Software , Internet , Conformação de Ácido Nucleico , Análise de Sequência de RNA , Interface Usuário-Computador
20.
PLoS One ; 15(1): e0227177, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31978147

RESUMO

Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.


Assuntos
Sequência de Bases/genética , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Software , Algoritmos , Confiabilidade dos Dados , Humanos , Modelos Estatísticos , Conformação de Ácido Nucleico , Probabilidade , Análise de Sequência de RNA , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA