Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Brief Bioinform ; 18(2): 306-311, 2017 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26984616

RESUMO

BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40-60% sequence identity range, the so-called 'BRaliBase Dent'. In this article, we show this dent is owing to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of transfer RNAs, which exhibit a conserved secondary structure. Our analysis, aside of its interest regarding the specific case of the BRaliBase benchmark, also raises important questions regarding the design and use of benchmarks in computational biology.


Assuntos
Benchmarking , Algoritmos , Biologia Computacional , Conformação de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de RNA , Software
2.
Algorithms Mol Biol ; 10: 22, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26150892

RESUMO

Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all solutions for which no other candidate solution scores better under all objectives. This gives, in a precise sense, better information than an artificial amalgamation of different scores into a single objective, but is more costly to compute. Pareto optimization naturally occurs with genetic algorithms, albeit in a heuristic fashion. Non-heuristic Pareto optimization so far has been used only with a few applications in bioinformatics. We study exact Pareto optimization for two objectives in a dynamic programming framework. We define a binary Pareto product operator [Formula: see text] on arbitrary scoring schemes. Independent of a particular algorithm, we prove that for two scoring schemes A and B used in dynamic programming, the scoring scheme [Formula: see text] correctly performs Pareto optimization over the same search space. We study different implementations of the Pareto operator with respect to their asymptotic and empirical efficiency. Without artificial amalgamation of objectives, and with no heuristics involved, Pareto optimization is faster than computing the same number of answers separately for each objective. For RNA structure prediction under the minimum free energy versus the maximum expected accuracy model, we show that the empirical size of the Pareto front remains within reasonable bounds. Pareto optimization lends itself to the comparative investigation of the behavior of two alternative scoring schemes for the same purpose. For the above scoring schemes, we observe that the Pareto front can be seen as a composition of a few macrostates, each consisting of several microstates that differ in the same limited way. We also study the relationship between abstract shape analysis and the Pareto front, and find that they extract information of a different nature from the folding space and can be meaningfully combined.

3.
BMC Bioinformatics ; 16: 178, 2015 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-26017195

RESUMO

BACKGROUND: Evolutionary variations let us define a set of similar nucleic acid sequences as a family if these different molecules execute a common function. Capturing their sequence variation by using e. g. position specific scoring matrices significantly improves sensitivity of detection tools. Members of a functional (non-coding) RNA family are affected by these variations not only on the sequence, but also on the structural level. For example, some transfer-RNAs exhibit a fifth helix in addition to the typical cloverleaf structure. Current covariance models - the unrivaled homology search approach for structured RNA - do not benefit from structural variation within a family, but rather penalize it. This leads to artificial subdivision of families and loss of information in the RFAM database. RESULTS: We propose an extension to the fundamental architecture of covariance models to allow for several, compatible consensus structures. The resulting models are called ambivalent covariance models. Evaluation on several RFAM families shows that coalescence of structural variation within a family by using ambivalent consensus models is superior to subdividing the family into multiple classical covariance models. CONCLUSION: A prototype and source code is available at http://bibiserv.cebitec.uni-bielefeld.de/acms.


Assuntos
Modelos Estatísticos , Matrizes de Pontuação de Posição Específica , RNA de Transferência/química , RNA não Traduzido/química , RNA/química , Análise de Sequência de RNA/métodos , Bases de Dados Factuais , Humanos , RNA/genética , RNA de Transferência/genética , RNA não Traduzido/genética
4.
RNA Biol ; 12(2): 197-207, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25779873

RESUMO

RNA family models describe classes of functionally related, non-coding RNAs based on sequence and structure conservation. The most important method for modeling RNA families is the use of covariance models, which are stochastic models that serve in the discovery of yet unknown, homologous RNAs. However, the performance of covariance models in finding remote homologs is poor for RNA families with high sequence conservation, while for families with high structure but low sequence conservation, these models are difficult to built in the first place. A complementary approach to RNA family modeling involves the use of thermodynamic matchers. Thermodynamic matchers are RNA folding programs, based on the established thermodynamic model, but tailored to a specific structural motif. As thermodynamic matchers focus on structure and folding energy, they unfold their potential in discovering homologs, when high structure conservation is paired with low sequence conservation. In contrast to covariance models, construction of thermodynamic matchers does not require an input alignment, but requires human design decisions and experimentation, and hence, model construction is more laborious. Here we report a case study on an RNA family that was constructed by means of thermodynamic matchers. It starts from a set of known but structurally different members of the same RNA family. The consensus secondary structure of this family consists of 2 to 4 adjacent hairpins. Each hairpin loop carries the same motif, CCUCCUCCC, while the stems show high variability in their nucleotide content. The present study describes (1) a novel approach for the integration of the structurally varying family into a single RNA family model by means of the thermodynamic matcher methodology, and (2) provides the results of homology searches that were conducted with this model in a wide spectrum of bacterial species.


Assuntos
Algoritmos , Bactérias Gram-Negativas/genética , Bactérias Gram-Positivas/genética , RNA Bacteriano/química , Pequeno RNA não Traduzido/química , Modelos Genéticos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , RNA Bacteriano/genética , Pequeno RNA não Traduzido/genética , Análise de Sequência de RNA , Homologia de Sequência do Ácido Nucleico , Sintenia , Termodinâmica
5.
Bioinformatics ; 31(3): 423-5, 2015 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-25273103

RESUMO

MOTIVATION: Abstract shape analysis, first proposed in 2004, allows one to extract several relevant structures from the folding space of an RNA sequence, preferable to focusing in a single structure of minimal free energy. We report recent extensions to this approach. RESULTS: We have rebuilt the original RNAshapes as a repository of components that allows us to integrate several established tools for RNA structure analysis: RNAshapes, RNAalishapes and pknotsRG, including its recent extension pKiss. As a spin-off, we obtain heretofore unavailable functionality: e. g. with pKiss, we can now perform abstract shape analysis for structures holding pseudoknots up to the complexity of kissing hairpin motifs. The new tool pAliKiss can predict kissing hairpin motifs from aligned sequences. Along with the integration, the functionality of the tools was also extended in manifold ways. AVAILABILITY AND IMPLEMENTATION: As before, the tool is available on the Bielefeld Bioinformatics server at http://bibiserv.cebitec.uni-bielefeld.de/rnashapesstudio. CONTACT: bibi-help@cebitec.uni-bielefeld.de.


Assuntos
Biologia Computacional/métodos , Coronavirus/genética , Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA/métodos
6.
RNA Biol ; 11(5): 550-62, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25003187

RESUMO

The symbiotic α-rhizobia Sinorhizobium meliloti, Bradyrhizobium japonicum, Rhizobium etli and the related plant pathogen Agrobacterium tumefaciens are important model organisms for studying plant-microbe interactions. These metabolically versatile soil bacteria are characterized by complex lifestyles and large genomes. Here we summarize the recent knowledge on their small non-coding RNAs (sRNAs) including conservation, function, and interaction of the sRNAs with the RNA chaperone Hfq. In each of these organisms, an inventory of hundreds of cis- and trans-encoded sRNAs with regulatory potential was uncovered by high-throughput approaches and used for the construction of 39 sRNA family models. Genome-wide analyses of hfq mutants and co-immunoprecipitation with tagged Hfq revealed a major impact of the RNA chaperone on the physiology of plant-associated α-proteobacteria including symbiosis and virulence. Highly conserved members of the SmelC411 family are the AbcR sRNAs, which predominantly regulate ABC transport systems. AbcR1 of A. tumefaciens controls the uptake of the plant-generated signaling molecule GABA and is a central regulator of nutrient uptake systems. It has similar functions in S. meliloti and the human pathogen Brucella abortus. As RNA degradation is an important process in RNA-based gene regulation, a short overview on ribonucleases in plant-associated α-proteobacteria concludes this review.


Assuntos
Alphaproteobacteria/genética , Regulação Bacteriana da Expressão Gênica , RNA Bacteriano/genética , Alphaproteobacteria/metabolismo , Pareamento de Bases , Família Multigênica , Plantas/microbiologia , Estabilidade de RNA , RNA Antissenso/química , RNA Antissenso/genética , RNA Antissenso/metabolismo , RNA Bacteriano/química , RNA Bacteriano/metabolismo , RNA Mensageiro , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/metabolismo , Transcriptoma
7.
RNA Biol ; 11(5): 563-79, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24786641

RESUMO

The RNA chaperone Hfq is a global post-transcriptional regulator in bacteria. Here, we used RNAseq to analyze RNA populations from the legume symbiont Sinorhizobium meliloti that were co-immunoprecipitated (CoIP-RNA) with a FLAG-tagged Hfq in five growth/stress conditions. Hfq-bound transcripts (1315) were largely identified in stressed bacteria and derived from small RNAs (sRNAs), both trans-encoded (6.4%) and antisense (asRNAs; 6.3%), and mRNAs (86%). Pull-down with Hfq recovered a small proportion of annotated S. meliloti sRNAs (14% of trans-sRNAs and 2% of asRNAs) suggesting a discrete impact of this protein in sRNA pathways. Nonetheless, Hfq selectively stabilized CoIP-enriched sRNAs, anticipating that these interactions are functionally significant. Transcription of 26 Hfq-bound sRNAs was predicted to occur from promoters recognized by the major stress σ factors σ(E2) or σ(H1/2). Recovery rates of sRNAs in each of the CoIP-RNA libraries suggest a large impact of Hfq-assisted riboregulation in S. meliloti osmoadaptation. Hfq directly targeted 18% of the predicted S. meliloti mRNAs, which encode functionally diverse proteins involved in transport and metabolism, σ(E2)-dependent stress responses, quorum sensing, flagella biosynthesis, ribosome, and membrane assembly or symbiotic nitrogen fixation. Canonical targeting of the 5' regions of two of the ABC transporter mRNAs by the homologous Hfq-binding AbcR1 and AbcR2 sRNAs leading to inhibition of protein synthesis was confirmed in vivo. We therefore provide a comprehensive resource for the systems-level deciphering of hitherto unexplored S. meliloti stress and symbiotic post-transcriptional regulons and the identification of Hfq-dependent sRNA-mRNA regulatory pairs.


Assuntos
Fator Proteico 1 do Hospedeiro/metabolismo , Processamento Pós-Transcricional do RNA , RNA Bacteriano/genética , RNA Bacteriano/metabolismo , Sinorhizobium meliloti/genética , Sinorhizobium meliloti/metabolismo , Estresse Fisiológico , Pareamento de Bases , Sítios de Ligação , Regulação Bacteriana da Expressão Gênica , Ligação Proteica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Reprodutibilidade dos Testes
8.
Methods Mol Biol ; 1097: 85-106, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24639156

RESUMO

Stochastic context free grammars are a formalism which plays a prominent role in RNA secondary structure analysis. This chapter provides the theoretical background on stochastic context free grammars. We recall the general definitions and study the basic properties, virtues, and shortcomings of stochastic context free grammars. We then introduce two ways in which they are used in RNA secondary structure analysis, secondary structure prediction and RNA family modeling. This prepares for the discussion of applications of stochastic context free grammars in the chapters on RFAM (6), Pfold (8), and INFERNAL (9).


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Algoritmos
9.
Methods Mol Biol ; 1097: 215-45, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24639162

RESUMO

Abstract shape analysis abstract shape analysis is a method to learn more about the complete Boltzmann ensemble of the secondary structures of a single RNA molecule. Abstract shapes classify competing secondary structures into classes that are defined by their arrangement of helices. It allows us to compute, in addition to the structure of minimal free energy, a set of structures that represents relevant and interesting structural alternatives. Furthermore, it allows to compute probabilities of all structures within a shape class. This allows to ensure that our representative subset covers the complete Boltzmann ensemble, except for a portion of negligible probability. This chapter explains the main functions of abstract shape analysis, as implemented in the tool RNA shapes. RNA shapes It reports on some other types of analysis that are based on the abstract shapes idea and shows how you can solve novel problems by creating your own shape abstractions.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , Dobramento de RNA , RNA/química , Algoritmos , Termodinâmica
10.
Methods Mol Biol ; 1097: 247-73, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24639163

RESUMO

Many methods have been proposed for RNA secondary structure comparison, and new ones are still being developed. In this chapter, we first consider structure representations and discuss their suitability for structure comparison. Then, we take a look at the more commonly used methods, restricting ourselves to structures without pseudo-knots. For comparing structures of the same sequence, we study base pair distances. For structures of different sequences (and of different length), we study variants of the tree edit model. We name some of the available tools and give pointers to the literature. We end with a short review on comparing structures with pseudo-knots as an unsolved problem and topic of active research.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Algoritmos
11.
PLoS One ; 8(12): e81912, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24312603

RESUMO

Defining genetic variants that predispose for diseases is an important initiative that can improve biological understanding and focus therapeutic development. Genetic mapping in humans and animal models has defined genomic regions controlling a variety of phenotypes known as quantitative trait loci (QTL). Causative disease determinants, including single nucleotide polymorphisms (SNPs), lie within these regions and can often be identified through effects on gene expression. We previously identified a QTL on rat chromosome 4 regulating macrophage phenotypes and immune-mediated diseases including experimental autoimmune encephalomyelitis (EAE). Gene analysis and a literature search identified lysine-specific demethylase 3A (Kdm3a) as a potential regulator of these phenotypes. Genomic sequencing determined only two synonymous SNPs in Kdm3a. The silent synonymous SNP in exon 15 of Kdm3a caused problems with quantitative PCR detection in the susceptible strain through reduced amplification efficiency due to altered secondary cDNA structure. Shape Probability Shift analysis predicted that the SNP often affects RNA folding; thus, it may impact protein translation. Despite these differences in rats, genetic knockout of Kdm3a in mice resulted in no dramatic effect on immune system development and activation or EAE susceptibility and severity. These results provide support for tools that analyze causative SNPs that impact nucleic acid structures.


Assuntos
DNA/química , Éxons/genética , Inativação Gênica , Histona Desmetilases com o Domínio Jumonji/deficiência , Histona Desmetilases com o Domínio Jumonji/genética , Polimorfismo de Nucleotídeo Único/genética , RNA/química , Animais , Sequência de Bases , DNA Complementar/genética , Encefalomielite Autoimune Experimental/genética , Encefalomielite Autoimune Experimental/imunologia , Feminino , Técnicas de Silenciamento de Genes , Camundongos , Fenótipo , Ratos
12.
BMC Genomics ; 14: 156, 2013 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-23497287

RESUMO

BACKGROUND: Sinorhizobium meliloti is a soil-dwelling α-proteobacterium that possesses a large, tripartite genome and engages in a nitrogen fixing symbiosis with its plant hosts. Although much is known about this important model organism, global characterization of genetic regulatory circuits has been hampered by a lack of information about transcription and promoters. RESULTS: Using an RNAseq approach and RNA populations representing 16 different growth and stress conditions, we comprehensively mapped S. meliloti transcription start sites (TSS). Our work identified 17,001 TSS that we grouped into six categories based on the genomic context of their transcripts: mRNA (4,430 TSS assigned to 2,657 protein-coding genes), leaderless mRNAs (171), putative mRNAs (425), internal sense transcripts (7,650), antisense RNA (3,720), and trans-encoded sRNAs (605). We used this TSS information to identify transcription factor binding sites and putative promoter sequences recognized by seven of the 15 known S. meliloti σ factors σ70, σ54, σH1, σH2, σE1, σE2, and σE9). Altogether, we predicted 2,770 new promoter sequences, including 1,302 located upstream of protein coding genes and 722 located upstream of antisense RNA or trans-encoded sRNA genes. To validate promoter predictions for targets of the general stress response σ factor, RpoE2 (σE2), we identified rpoE2-dependent genes using microarrays and confirmed TSS for a subset of these by 5' RACE mapping. CONCLUSIONS: By identifying TSS and promoters on a global scale, our work provides a firm foundation for the continued study of S. meliloti gene expression with relation to gene organization, σ factors and other transcription factors, and regulatory RNAs.


Assuntos
Genes Bacterianos , Sinorhizobium meliloti/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Sítios de Ligação , Mapeamento Cromossômico , Regiões Promotoras Genéticas , RNA/metabolismo , Análise de Sequência de RNA , Fator sigma/genética , Fator sigma/metabolismo , Sinorhizobium meliloti/metabolismo , Simbiose , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição
13.
Bioinformatics ; 29(5): 551-60, 2013 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-23355290

RESUMO

MOTIVATION: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. RESULTS: In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. AVAILABILITY: Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.


Assuntos
Algoritmos , Linguagens de Programação , Análise de Sequência/métodos , Biologia Computacional/métodos , Dobramento de RNA
14.
BMC Bioinformatics ; 13 Suppl 17: S22, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23281941

RESUMO

BACKGROUND: Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. RESULTS: In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. CONCLUSIONS: Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.


Assuntos
Biologia Computacional , Educação/métodos , Serviços de Informação , Internet , Software , Pesquisa
15.
BMC Bioinformatics ; 12: 429, 2011 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-22051375

RESUMO

BACKGROUND: Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis. RESULTS: We extract four different models of the thermodynamic folding space which underlie the programs RNAFOLD, RNASHAPES, and RNASUBOPT. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences. CONCLUSIONS: We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development.


Assuntos
Algoritmos , Dobramento de RNA , RNA/química , Sequência de Bases , Biologia Computacional , Probabilidade , Análise de Sequência de RNA , Termodinâmica
16.
Bioinformatics ; 27(7): 903-11, 2011 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-21278189

RESUMO

MOTIVATION: The rapidly increasing amounts of data available from new high-throughput methods have made data processing without automated pipelines infeasible. As was pointed out in several publications, integration of data and analytic resources into workflow systems provides a solution to this problem, simplifying the task of data analysis. Various applications for defining and running workflows in the field of bioinformatics have been proposed and published, e.g. Galaxy, Mobyle, Taverna, Pegasus or Kepler. One of the main aims of such workflow systems is to enable scientists to focus on analysing their datasets instead of taking care for data management, job management or monitoring the execution of computational tasks. The currently available workflow systems achieve this goal, but fundamentally differ in their way of executing workflows. RESULTS: We have developed the Conveyor software library, a multitiered generic workflow engine for composition, execution and monitoring of complex workflows. It features an open, extensible system architecture and concurrent program execution to exploit resources available on modern multicore CPU hardware. It offers the ability to build complex workflows with branches, loops and other control structures. Two example use cases illustrate the application of the versatile Conveyor engine to common bioinformatics problems. AVAILABILITY: The Conveyor application including client and server are available at http://conveyor.cebitec.uni-bielefeld.de.


Assuntos
Biologia Computacional , Software , Escherichia coli/genética , Genoma Bacteriano , Genômica , Anotação de Sequência Molecular , Fluxo de Trabalho
17.
Artigo em Inglês | MEDLINE | ID: mdl-21233528

RESUMO

Stochastic models, such as hidden Markov models or stochastic context-free grammars (SCFGs) can fail to return the correct, maximum likelihood solution in the case of semantic ambiguity. This problem arises when the algorithm implementing the model inspects the same solution in different guises. It is a difficult problem in the sense that proving semantic nonambiguity has been shown to be algorithmically undecidable, while compensating for it (by coalescing scores of equivalent solutions) has been shown to be NP-hard. For stochastic context-free grammars modeling RNA secondary structure, it has been shown that the distortion of results can be quite severe. Much less is known about the case when stochastic context-free grammars model the matching of a query sequence to an implicit consensus structure for an RNA family. We find that three different, meaningful semantics can be associated with the matching of a query against the model--a structural, an alignment, and a trace semantics. Rfam models correctly implement the alignment semantics, and are ambiguous with respect to the other two semantics, which are more abstract. We show how provably correct models can be generated for the trace semantics. For approaches, where such a proof is not possible, we present an automated pipeline to check post factum for ambiguity of the generated models. We propose that both the structure and the trace semantics are worth-while concepts for further study, possibly better suited to capture remotely related family members.


Assuntos
RNA/química , Algoritmos , Conformação de Ácido Nucleico , Semântica , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos
18.
Genes (Basel) ; 2(4): 925-56, 2011 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-24710299

RESUMO

Post-transcriptional regulation by trans-encoded sRNAs, for example via base-pairing with target mRNAs, is a common feature in bacteria and influences various cell processes, e.g., response to stress factors. Several studies based on computational and RNA-seq approaches identified approximately 180 trans-encoded sRNAs in Sinorhizobium meliloti. The initial point of this report is a set of 52 trans-encoded sRNAs derived from the former studies. Sequence homology combined with structural conservation analyses were applied to elucidate the occurrence and distribution of conserved trans-encoded sRNAs in the order of Rhizobiales. This approach resulted in 39 RNA family models (RFMs) which showed various taxonomic distribution patterns. Whereas the majority of RFMs was restricted to Sinorhizobium species or the Rhizobiaceae, members of a few RFMs were more widely distributed in the Rhizobiales. Access to this data is provided via the RhizoGATE portal [1,2].

19.
BMC Evol Biol ; 10: 167, 2010 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-20525398

RESUMO

BACKGROUND: Minisatellites are genomic loci composed of tandem arrays of short repetitive DNA segments. A minisatellite map is a sequence of symbols that represents the tandem repeat array such that the set of symbols is in one-to-one correspondence with the set of distinct repeats. Due to variations in repeat type and organization as well as copy number, the minisatellite maps have been widely used in forensic and population studies. In either domain, researchers need to compare the set of maps to each other, to build phylogenetic trees, to spot structural variations, and to study duplication dynamics. Efficient algorithms for these tasks are required to carry them out reliably and in reasonable time. RESULTS: In this paper we present WAMI, a web-server for the analysis of minisatellite maps. It performs the above mentioned computational tasks using efficient algorithms that take the model of map evolution into account. The WAMI interface is easy to use and the results of each analysis task are visualized. CONCLUSIONS: To the best of our knowledge, WAMI is the first server providing all these computational facilities to the minisatellite community. The WAMI web-interface and the source code of the underlying programs are available at http://www.nubios.nileu.edu.eg/tools/wami.


Assuntos
Biologia Computacional/métodos , Repetições Minissatélites , Análise de Sequência de DNA/métodos , Software , Algoritmos , Internet , Filogenia , Alinhamento de Sequência , Interface Usuário-Computador
20.
BMC Bioinformatics ; 11: 222, 2010 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-20433706

RESUMO

BACKGROUND: A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. RESULTS: Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. CONCLUSIONS: Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.


Assuntos
RNA/química , Alinhamento de Sequência , Análise de Sequência de RNA , Algoritmos , Sequência de Bases , Bases de Dados Factuais , Dados de Sequência Molecular , Conformação de Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...