Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
1.
Mol Cell ; 81(10): 2135-2147.e5, 2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-33713597

RESUMO

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is currently a global pandemic. CoVs are known to generate negative subgenomes (subgenomic RNAs [sgRNAs]) through transcription-regulating sequence (TRS)-dependent template switching, but the global dynamic landscapes of coronaviral subgenomes and regulatory rules remain unclear. Here, using next-generation sequencing (NGS) short-read and Nanopore long-read poly(A) RNA sequencing in two cell types at multiple time points after infection with SARS-CoV-2, we identified hundreds of template switches and constructed the dynamic landscapes of SARS-CoV-2 subgenomes. Interestingly, template switching could occur in a bidirectional manner, with diverse SARS-CoV-2 subgenomes generated from successive template-switching events. The majority of template switches result from RNA-RNA interactions, including seed and compensatory modes, with terminal pairing status as a key determinant. Two TRS-independent template switch modes are also responsible for subgenome biogenesis. Our findings reveal the subgenome landscape of SARS-CoV-2 and its regulatory features, providing a molecular basis for understanding subgenome biogenesis and developing novel anti-viral strategies.


Assuntos
COVID-19 , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , RNA Viral , SARS-CoV-2 , Animais , COVID-19/genética , COVID-19/metabolismo , Células CACO-2 , Chlorocebus aethiops , Humanos , RNA Viral/genética , RNA Viral/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , Células Vero
2.
BMC Bioinformatics ; 23(Suppl 8): 424, 2022 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-36241988

RESUMO

BACKGROUND: RNA deleterious point mutation prediction was previously addressed with programs such as RNAmute and MultiRNAmute. The purpose of these programs is to predict a global conformational rearrangement of the secondary structure of a functional RNA molecule, thereby disrupting its function. RNAmute was designed to deal with only single point mutations in a brute force manner, while in MultiRNAmute an efficient approach to deal with multiple point mutations was developed. The approach used in MultiRNAmute is based on the stabilization of the suboptimal RNA folding prediction solutions and/or destabilization of the optimal folding prediction solution of the wild type RNA molecule. The MultiRNAmute algorithm is significantly more efficient than the brute force approach in RNAmute, but in the case of long sequences and large m-point mutation sets the MultiRNAmute becomes exponential in examining all possible stabilizing and destabilizing mutations. RESULTS: An inherent limitation in the RNAmute and MultiRNAmute programs is their ability to predict only substitution mutations, as these programs were not designed to work with deletion or insertion mutations. To address this limitation we herein develop a very fast algorithm, based on suboptimal folding solutions, to predict a predefined number of multiple point deleterious mutations as specified by the user. Depending on the user's choice, each such set of mutations may contain combinations of deletions, insertions and substitution mutations. Additionally, we prove the hardness of predicting the most deleterious set of point mutations in structural RNAs. CONCLUSIONS: We developed a method that extends our previous MultiRNAmute method to predict insertion and deletion mutations in addition to substitutions. The additional advantage of the new method is its efficiency to find a predefined number of deleterious mutations. Our new method may be exploited by biologists and virologists prior to site-directed mutagenesis experiments, which involve indel mutations along with substitutions. For example, our method may help to investigate the change of function in an RNA virus via mutations that disrupt important motifs in its secondary structure.


Assuntos
Mutação INDEL , RNA , Mutação , Mutação Puntual , RNA/química , RNA/genética , Análise de Sequência de RNA
3.
Bioinformatics ; 37(15): 2126-2133, 2021 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-33538792

RESUMO

MOTIVATION: Predicting the folding dynamics of RNAs is a computationally difficult problem, first and foremost due to the combinatorial explosion of alternative structures in the folding space. Abstractions are therefore needed to simplify downstream analyses, and thus make them computationally tractable. This can be achieved by various structure sampling algorithms. However, current sampling methods are still time consuming and frequently fail to represent key elements of the folding space. METHOD: We introduce RNAxplorer, a novel adaptive sampling method to efficiently explore the structure space of RNAs. RNAxplorer uses dynamic programming to perform an efficient Boltzmann sampling in the presence of guiding potentials, which are accumulated into pseudo-energy terms and reflect similarity to already well-sampled structures. This way, we effectively steer sampling toward underrepresented or unexplored regions of the structure space. RESULTS: We developed and applied different measures to benchmark our sampling methods against its competitors. Most of the measures show that RNAxplorer produces more diverse structure samples, yields rare conformations that may be inaccessible to other sampling methods and is better at finding the most relevant kinetic traps in the landscape. Thus, it produces a more representative coarse graining of the landscape, which is well suited to subsequently compute better approximations of RNA folding kinetics. AVAILABILITYAND IMPLEMENTATION: https://github.com/ViennaRNA/RNAxplorer/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nucleic Acids Res ; 48(15): 8276-8289, 2020 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-32735675

RESUMO

The manual production of reliable RNA structure models from chemical probing experiments benefits from the integration of information derived from multiple protocols and reagents. However, the interpretation of multiple probing profiles remains a complex task, hindering the quality and reproducibility of modeling efforts. We introduce IPANEMAP, the first automated method for the modeling of RNA structure from multiple probing reactivity profiles. Input profiles can result from experiments based on diverse protocols, reagents, or collection of variants, and are jointly analyzed to predict the dominant conformations of an RNA. IPANEMAP combines sampling, clustering and multi-optimization, to produce secondary structure models that are both stable and well-supported by experimental evidences. The analysis of multiple reactivity profiles, both publicly available and produced in our study, demonstrates the good performances of IPANEMAP, even in a mono probing setting. It confirms the potential of integrating multiple sources of probing data, informing the design of informative probing assays.


Assuntos
Conformação de Ácido Nucleico , RNA/química , Software , Amebozoários/genética , Benchmarking , Conjuntos de Dados como Assunto , Mutação , RNA/genética
5.
Bioinformatics ; 36(9): 2920-2922, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31971575

RESUMO

SUMMARY: RNA design has conceptually evolved from the inverse RNA folding problem. In the classical inverse RNA problem, the user inputs an RNA secondary structure and receives an output RNA sequence that folds into it. Although modern RNA design methods are based on the same principle, a finer control over the resulting sequences is sought. As an important example, a substantial number of non-coding RNA families show high preservation in specific regions, while being more flexible in others and this information should be utilized in the design. By using the additional information, RNA design tools can help solve problems of practical interest in the growing fields of synthetic biology and nanotechnology. incaRNAfbinv 2.0 utilizes a fragment-based approach, enabling a control of specific RNA secondary structure motifs. The new version allows significantly more control over the general RNA shape, and also allows to express specific restrictions over each motif separately, in addition to other advanced features. AVAILABILITY AND IMPLEMENTATION: incaRNAfbinv 2.0 is available through a standalone package and a web-server at https://www.cs.bgu.ac.il/incaRNAfbinv. Source code, command-line and GUI wrappers can be found at https://github.com/matandro/RNAsfbinv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA , Software , Motivos de Nucleotídeos , RNA/genética , Dobramento de RNA , Análise de Sequência de RNA
6.
BMC Cancer ; 21(1): 394, 2021 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-33845808

RESUMO

BACKGROUND: RNA-seq data are increasingly used to derive prognostic signatures for cancer outcome prediction. A limitation of current predictors is their reliance on reference gene annotations, which amounts to ignoring large numbers of non-canonical RNAs produced in disease tissues. A recently introduced kind of transcriptome classifier operates entirely in a reference-free manner, relying on k-mers extracted from patient RNA-seq data. METHODS: In this paper, we set out to compare conventional and reference-free signatures in risk and relapse prediction of prostate cancer. To compare the two approaches as fairly as possible, we set up a common procedure that takes as input either a k-mer count matrix or a gene expression matrix, extracts a signature and evaluates this signature in an independent dataset. RESULTS: We find that both gene-based and k-mer based classifiers had similarly high performances for risk prediction and a markedly lower performance for relapse prediction. Interestingly, the reference-free signatures included a set of sequences mapping to novel lncRNAs or variable regions of cancer driver genes that were not part of gene-based signatures. CONCLUSIONS: Reference-free classifiers are thus a promising strategy for the identification of novel prognostic RNA biomarkers.


Assuntos
Biomarcadores Tumorais , Neoplasias da Próstata/genética , Neoplasias da Próstata/mortalidade , Transcriptoma , Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Prognóstico , Neoplasias da Próstata/patologia , Recidiva , Reprodutibilidade dos Testes , Aprendizado de Máquina Supervisionado
7.
Brief Bioinform ; 19(2): 350-358, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-28049135

RESUMO

Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.


Assuntos
Algoritmos , Conformação de Ácido Nucleico , Dobramento de RNA , RNA/química , Software , Animais , Biologia Computacional/métodos , Humanos , Modelos Moleculares
8.
J Math Biol ; 80(5): 1353-1388, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32060618

RESUMO

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.


Assuntos
Evolução Molecular , Modelos Genéticos , Algoritmos , Biologia Computacional , Simulação por Computador , Deleção de Genes , Duplicação Gênica , Transferência Genética Horizontal , Especiação Genética , Conceitos Matemáticos , Família Multigênica , Filogenia
9.
BMC Bioinformatics ; 20(1): 209, 2019 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-31023239

RESUMO

BACKGROUND: The design of multi-stable RNA molecules has important applications in biology, medicine, and biotechnology. Synthetic design approaches profit strongly from effective in-silico methods, which substantially reduce the need for costly wet-lab experiments. RESULTS: We devise a novel approach to a central ingredient of most in-silico design methods: the generation of sequences that fold well into multiple target structures. Based on constraint networks, our approach supports generic Boltzmann-weighted sampling, which enables the positive design of RNA sequences with specific free energies (for each of multiple, possibly pseudoknotted, target structures) and GC-content. Moreover, we study general properties of our approach empirically and generate biologically relevant multi-target Boltzmann-weighted designs for an established design benchmark. Our results demonstrate the efficacy and feasibility of the method in practice as well as the benefits of Boltzmann sampling over the previously best multi-target sampling strategy-even for the case of negative design of multi-stable RNAs. Besides empirically studies, we finally justify the algorithmic details due to a fundamental theoretic result about multi-stable RNA design, namely the #P-hardness of the counting of designs. CONCLUSION: introduces a novel, flexible, and effective approach to multi-target RNA design, which promises broad applicability and extensibility. Our free software is available at: https://github.com/yannponty/RNARedPrint Supplementary data are available online.


Assuntos
RNA/química , Interface Usuário-Computador , Algoritmos , Composição de Bases , Modelos Teóricos , Conformação de Ácido Nucleico
10.
Brief Bioinform ; 18(2): 306-311, 2017 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26984616

RESUMO

BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40-60% sequence identity range, the so-called 'BRaliBase Dent'. In this article, we show this dent is owing to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of transfer RNAs, which exhibit a conserved secondary structure. Our analysis, aside of its interest regarding the specific case of the BRaliBase benchmark, also raises important questions regarding the design and use of benchmarks in computational biology.


Assuntos
Benchmarking , Algoritmos , Biologia Computacional , Conformação de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de RNA , Software
11.
Bioinformatics ; 39(Supplement_1): i1-i2, 2023 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387153
12.
PLoS Comput Biol ; 14(3): e1005992, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29543809

RESUMO

We present a new educational initiative called Meet-U that aims to train students for collaborative work in computational biology and to bridge the gap between education and research. Meet-U mimics the setup of collaborative research projects and takes advantage of the most popular tools for collaborative work and of cloud computing. Students are grouped in teams of 4-5 people and have to realize a project from A to Z that answers a challenging question in biology. Meet-U promotes "coopetition," as the students collaborate within and across the teams and are also in competition with each other to develop the best final product. Meet-U fosters interactions between different actors of education and research through the organization of a meeting day, open to everyone, where the students present their work to a jury of researchers and jury members give research seminars. This very unique combination of education and research is strongly motivating for the students and provides a formidable opportunity for a scientific community to unite and increase its visibility. We report on our experience with Meet-U in two French universities with master's students in bioinformatics and modeling, with protein-protein docking as the subject of the course. Meet-U is easy to implement and can be straightforwardly transferred to other fields and/or universities. All the information and data are available at www.meet-u.org.


Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Pesquisa/educação , Humanos , Projetos de Pesquisa , Estudantes , Universidades
13.
Nucleic Acids Res ; 45(12): 7382-7400, 2017 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-28449096

RESUMO

In the late phase of the HIV virus cycle, the unspliced genomic RNA is exported to the cytoplasm for the necessary translation of the Gag and Gag-pol polyproteins. Three distinct translation initiation mechanisms ensuring Gag production have been described with little rationale for their multiplicity. The Gag-IRES has the singularity to be located within Gag ORF and to directly interact with ribosomal 40S. Aiming at elucidating the specificity and the relevance of this interaction, we probed HIV-1 Gag-IRES structure and developed an innovative integrative modelling strategy to take into account all the gathered information. We propose a novel Gag-IRES secondary structure strongly supported by all experimental data. We further demonstrate the presence of two regions within Gag-IRES that independently and directly interact with the ribosome. Importantly, these binding sites are functionally relevant to Gag translation both in vitro and ex vivo. This work provides insight into the Gag-IRES molecular mechanism and gives compelling evidence for its physiological importance. It allows us to propose original hypotheses about the IRES physiological role and conservation among primate lentiviruses.


Assuntos
HIV-1/genética , Sítios Internos de Entrada Ribossomal , Iniciação Traducional da Cadeia Peptídica , Subunidades Ribossômicas Menores de Eucariotos/metabolismo , Produtos do Gene gag do Vírus da Imunodeficiência Humana/genética , Genes Reporter , HIV-1/metabolismo , Humanos , Células Jurkat , Cinética , Luciferases/genética , Luciferases/metabolismo , Modelos Moleculares , Conformação de Ácido Nucleico , Fases de Leitura Aberta , Subunidades Ribossômicas Menores de Eucariotos/ultraestrutura , Produtos do Gene gag do Vírus da Imunodeficiência Humana/metabolismo
14.
Bioinformatics ; 33(14): i283-i292, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28882001

RESUMO

MOTIVATION: Kinetics is key to understand many phenomena involving RNAs, such as co-transcriptional folding and riboswitches. Exact out-of-equilibrium studies induce extreme computational demands, leading state-of-the-art methods to rely on approximated kinetics landscapes, obtained using sampling strategies that strive to generate the key landmarks of the landscape topology. However, such methods are impeded by a large level of redundancy within sampled sets. Such a redundancy is uninformative, and obfuscates important intermediate states, leading to an incomplete vision of RNA dynamics. RESULTS: We introduce RNANR, a new set of algorithms for the exploration of RNA kinetics landscapes at the secondary structure level. RNANR considers locally optimal structures, a reduced set of RNA conformations, in order to focus its sampling on basins in the kinetic landscape. Along with an exhaustive enumeration, RNANR implements a novel non-redundant stochastic sampling, and offers a rich array of structural parameters. Our tests on both real and random RNAs reveal that RNANR allows to generate more unique structures in a given time than its competitors, and allows a deeper exploration of kinetics landscapes. AVAILABILITY AND IMPLEMENTATION: RNANR is freely available at https://project.inria.fr/rnalands/rnanr . CONTACT: yann.ponty@lix.polytechnique.fr.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/metabolismo , Riboswitch , Software , Termodinâmica , Algoritmos , Cinética , RNA/química , Transcrição Gênica
15.
Nucleic Acids Res ; 44(11): e104, 2016 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-27095200

RESUMO

Systematic structure probing experiments (e.g. SHAPE) of RNA mutants such as the mutate-and-map (MaM) protocol give us a direct access into the genetic robustness of ncRNA structures. Comparative studies of homologous sequences provide a distinct, yet complementary, approach to analyze structural and functional properties of non-coding RNAs. In this paper, we introduce a formal framework to combine the biochemical signal collected from MaM experiments, with the evolutionary information available in multiple sequence alignments. We apply neutral theory principles to detect complex long-range dependencies between nucleotides of a single stranded RNA, and implement these ideas into a software called aRNhAck We illustrate the biological significance of this signal and show that the nucleotides networks calculated with aRNhAck are correlated with nucleotides located in RNA-RNA, RNA-protein, RNA-DNA and RNA-ligand interfaces. aRNhAck is freely available at http://csb.cs.mcgill.ca/arnhack.


Assuntos
Evolução Molecular , Mutação , Conformação de Ácido Nucleico , RNA/genética , Software , Algoritmos , Sítios de Ligação , Biologia Computacional/métodos , DNA/química , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Proteínas/química , RNA/química , Navegador
16.
Nucleic Acids Res ; 44(W1): W308-14, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27185893

RESUMO

In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv.


Assuntos
Conformação de Ácido Nucleico , Dobramento de RNA , RNA/química , Software , Algoritmos , Composição de Bases , Pareamento de Bases , Sequência de Bases , Gráficos por Computador , Internet , Mutação , RNA/genética , Análise de Sequência de RNA , Termodinâmica
17.
Bioinformatics ; 32(13): 2056-8, 2016 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153713

RESUMO

UNLABELLED: : A gene tree-species tree reconciliation explains the evolution of a gene tree within the species tree given a model of gene-family evolution. We describe ecceTERA, a program that implements a generic parsimony reconciliation algorithm, which accounts for gene duplication, loss and transfer (DTL) as well as speciation, involving sampled and unsampled lineages, within undated, fully dated or partially dated species trees. The ecceTERA reconciliation model and algorithm generalize or improve upon most published DTL parsimony algorithms for binary species trees and binary gene trees. Moreover, ecceTERA can estimate accurate species-tree aware gene trees using amalgamation. AVAILABILITY AND IMPLEMENTATION: ecceTERA is freely available under http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera CONTACT: celine.scornavacca@umontpellier.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Duplicação Gênica , Família Multigênica , Filogenia , Algoritmos , Modelos Teóricos
18.
BMC Bioinformatics ; 16 Suppl 19: S6, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26696141

RESUMO

CONTEXT: The reconstruction of evolutionary scenarios for whole genomes in terms of genome rearrangements is a fundamental problem in evolutionary and comparative genomics. The DeCo algorithm, recently introduced by Bérard et al., computes parsimonious evolutionary scenarios for gene adjacencies, from pairs of reconciled gene trees. However, as for many combinatorial optimization algorithms, there can exist many co-optimal, or slightly sub-optimal, evolutionary scenarios that deserve to be considered. CONTRIBUTION: We extend the DeCo algorithm to sample evolutionary scenarios from the whole solution space under the Boltzmann distribution, and also to compute Boltzmann probabilities for specific ancestral adjacencies. RESULTS: We apply our algorithms to a dataset of mammalian gene trees and adjacencies, and observe a significant reduction of the number of syntenic conflicts observed in the resulting ancestral gene adjacencies.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Filogenia , Animais , Bases de Dados Genéticas , Genoma , Mamíferos/genética , Probabilidade , Temperatura
19.
Nucleic Acids Res ; 41(Web Server issue): W480-5, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23748952

RESUMO

More than a simple carrier of the genetic information, messenger RNA (mRNA) coding regions can also harbor functional elements that evolved to control different post-transcriptional processes, such as mRNA splicing, localization and translation. Functional elements in RNA molecules are often encoded by secondary structure elements. In this aticle, we introduce Structural Profile Assignment of RNA Coding Sequences (SPARCS), an efficient method to analyze the (secondary) structure profile of protein-coding regions in mRNAs. First, we develop a novel algorithm that enables us to sample uniformly the sequence landscape preserving the dinucleotide frequency and the encoded amino acid sequence of the input mRNA. Then, we use this algorithm to generate a set of artificial sequences that is used to estimate the Z-score of classical structural metrics such as the sum of base pairing probabilities and the base pairing entropy. Finally, we use these metrics to predict structured and unstructured regions in the input mRNA sequence. We applied our methods to study the structural profile of the ASH1 genes and recovered key structural elements. A web server implementing this discovery pipeline is available at http://csb.cs.mcgill.ca/sparcs together with the source code of the sampling algorithm.


Assuntos
RNA Mensageiro/química , Análise de Sequência de RNA/métodos , Software , Algoritmos , Pareamento de Bases , Internet , Conformação de Ácido Nucleico , Proteínas/genética , Proteínas Repressoras/genética , Proteínas de Saccharomyces cerevisiae/genética
20.
Bioinformatics ; 29(13): i308-15, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812999

RESUMO

MOTIVATIONS: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. RESULTS: In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. AVAILABILITY: IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , RNA/química , Composição de Bases , Sequência de Bases , Modelos Estatísticos , Conformação de Ácido Nucleico , Nucleotídeos/análise , Análise de Sequência de RNA , Software , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA