Pesquisa | BVS IEC

1.

The BRaliBase dent-a tale of benchmark design and interpretation.

Löwes, Benedikt; Chauve, Cedric; Ponty, Yann; Giegerich, Robert.

Brief Bioinform ; 18(2): 306-311, 2017 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-26984616

RESUMO

BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40-60% sequence identity range, the so-called 'BRaliBase Dent'. In this article, we show this dent is owing to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of transfer RNAs, which exhibit a conserved secondary structure. Our analysis, aside of its interest regarding the specific case of the BRaliBase benchmark, also raises important questions regarding the design and use of benchmarks in computational biology.

Assuntos

Benchmarking , Algoritmos , Biologia Computacional , Conformação de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de RNA , Software

2.

The RNA shapes studio.

Janssen, Stefan; Giegerich, Robert.

Bioinformatics ; 31(3): 423-5, 2015 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-25273103

RESUMO

MOTIVATION: Abstract shape analysis, first proposed in 2004, allows one to extract several relevant structures from the folding space of an RNA sequence, preferable to focusing in a single structure of minimal free energy. We report recent extensions to this approach. RESULTS: We have rebuilt the original RNAshapes as a repository of components that allows us to integrate several established tools for RNA structure analysis: RNAshapes, RNAalishapes and pknotsRG, including its recent extension pKiss. As a spin-off, we obtain heretofore unavailable functionality: e. g. with pKiss, we can now perform abstract shape analysis for structures holding pseudoknots up to the complexity of kissing hairpin motifs. The new tool pAliKiss can predict kissing hairpin motifs from aligned sequences. Along with the integration, the functionality of the tools was also extended in manifold ways. AVAILABILITY AND IMPLEMENTATION: As before, the tool is available on the Bielefeld Bioinformatics server at http://bibiserv.cebitec.uni-bielefeld.de/rnashapesstudio. CONTACT: bibi-help@cebitec.uni-bielefeld.de.

Assuntos

Biologia Computacional/métodos , Coronavirus/genética , Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA/métodos

3.

Ambivalent covariance models.

Janssen, Stefan; Giegerich, Robert.

BMC Bioinformatics ; 16: 178, 2015 May 28.

Artigo em Inglês | MEDLINE | ID: mdl-26017195

RESUMO

BACKGROUND: Evolutionary variations let us define a set of similar nucleic acid sequences as a family if these different molecules execute a common function. Capturing their sequence variation by using e. g. position specific scoring matrices significantly improves sensitivity of detection tools. Members of a functional (non-coding) RNA family are affected by these variations not only on the sequence, but also on the structural level. For example, some transfer-RNAs exhibit a fifth helix in addition to the typical cloverleaf structure. Current covariance models - the unrivaled homology search approach for structured RNA - do not benefit from structural variation within a family, but rather penalize it. This leads to artificial subdivision of families and loss of information in the RFAM database. RESULTS: We propose an extension to the fundamental architecture of covariance models to allow for several, compatible consensus structures. The resulting models are called ambivalent covariance models. Evaluation on several RFAM families shows that coalescence of structural variation within a family by using ambivalent consensus models is superior to subdividing the family into multiple classical covariance models. CONCLUSION: A prototype and source code is available at http://bibiserv.cebitec.uni-bielefeld.de/acms.

Assuntos

Modelos Estatísticos , Matrizes de Pontuação de Posição Específica , RNA de Transferência/química , RNA não Traduzido/química , RNA/química , Análise de Sequência de RNA/métodos , Bases de Dados Factuais , Humanos , RNA/genética , RNA de Transferência/genética , RNA não Traduzido/genética

4.

Thermodynamic matchers for the construction of the cuckoo RNA family.

Reinkensmeier, Jan; Giegerich, Robert.

RNA Biol ; 12(2): 197-207, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25779873

RESUMO

RNA family models describe classes of functionally related, non-coding RNAs based on sequence and structure conservation. The most important method for modeling RNA families is the use of covariance models, which are stochastic models that serve in the discovery of yet unknown, homologous RNAs. However, the performance of covariance models in finding remote homologs is poor for RNA families with high sequence conservation, while for families with high structure but low sequence conservation, these models are difficult to built in the first place. A complementary approach to RNA family modeling involves the use of thermodynamic matchers. Thermodynamic matchers are RNA folding programs, based on the established thermodynamic model, but tailored to a specific structural motif. As thermodynamic matchers focus on structure and folding energy, they unfold their potential in discovering homologs, when high structure conservation is paired with low sequence conservation. In contrast to covariance models, construction of thermodynamic matchers does not require an input alignment, but requires human design decisions and experimentation, and hence, model construction is more laborious. Here we report a case study on an RNA family that was constructed by means of thermodynamic matchers. It starts from a set of known but structurally different members of the same RNA family. The consensus secondary structure of this family consists of 2 to 4 adjacent hairpins. Each hairpin loop carries the same motif, CCUCCUCCC, while the stems show high variability in their nucleotide content. The present study describes (1) a novel approach for the integration of the structurally varying family into a single RNA family model by means of the thermodynamic matcher methodology, and (2) provides the results of homology searches that were conducted with this model in a wide spectrum of bacterial species.

Assuntos

Algoritmos , Bactérias Gram-Negativas/genética , Bactérias Gram-Positivas/genética , RNA Bacteriano/química , Pequeno RNA não Traduzido/química , Modelos Genéticos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , RNA Bacteriano/genética , Pequeno RNA não Traduzido/genética , Análise de Sequência de RNA , Homologia de Sequência do Ácido Nucleico , Sintenia , Termodinâmica

5.

Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.

Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert.

Bioinformatics ; 29(5): 551-60, 2013 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-23355290

RESUMO

MOTIVATION: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. RESULTS: In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. AVAILABILITY: Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.

Assuntos

Algoritmos , Linguagens de Programação , Análise de Sequência/métodos , Biologia Computacional/métodos , Dobramento de RNA

6.

Riboregulation in plant-associated α-proteobacteria.

Becker, Anke; Overlöper, Aaron; Schlüter, Jan-Philip; Reinkensmeier, Jan; Robledo, Marta; Giegerich, Robert; Narberhaus, Franz; Evguenieva-Hackenberg, Elena.

RNA Biol ; 11(5): 550-62, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25003187

RESUMO

The symbiotic α-rhizobia Sinorhizobium meliloti, Bradyrhizobium japonicum, Rhizobium etli and the related plant pathogen Agrobacterium tumefaciens are important model organisms for studying plant-microbe interactions. These metabolically versatile soil bacteria are characterized by complex lifestyles and large genomes. Here we summarize the recent knowledge on their small non-coding RNAs (sRNAs) including conservation, function, and interaction of the sRNAs with the RNA chaperone Hfq. In each of these organisms, an inventory of hundreds of cis- and trans-encoded sRNAs with regulatory potential was uncovered by high-throughput approaches and used for the construction of 39 sRNA family models. Genome-wide analyses of hfq mutants and co-immunoprecipitation with tagged Hfq revealed a major impact of the RNA chaperone on the physiology of plant-associated α-proteobacteria including symbiosis and virulence. Highly conserved members of the SmelC411 family are the AbcR sRNAs, which predominantly regulate ABC transport systems. AbcR1 of A. tumefaciens controls the uptake of the plant-generated signaling molecule GABA and is a central regulator of nutrient uptake systems. It has similar functions in S. meliloti and the human pathogen Brucella abortus. As RNA degradation is an important process in RNA-based gene regulation, a short overview on ribonucleases in plant-associated α-proteobacteria concludes this review.

Assuntos

Alphaproteobacteria/genética , Regulação Bacteriana da Expressão Gênica , RNA Bacteriano/genética , Alphaproteobacteria/metabolismo , Pareamento de Bases , Família Multigênica , Plantas/microbiologia , Estabilidade de RNA , RNA Antissenso/química , RNA Antissenso/genética , RNA Antissenso/metabolismo , RNA Bacteriano/química , RNA Bacteriano/metabolismo , RNA Mensageiro , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/metabolismo , Transcriptoma

7.

Genome-wide profiling of Hfq-binding RNAs uncovers extensive post-transcriptional rewiring of major stress response and symbiotic regulons in Sinorhizobium meliloti.

Torres-Quesada, Omar; Reinkensmeier, Jan; Schlüter, Jan-Philip; Robledo, Marta; Peregrina, Alexandra; Giegerich, Robert; Toro, Nicolás; Becker, Anke; Jiménez-Zurdo, Jose I.

RNA Biol ; 11(5): 563-79, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24786641

RESUMO

The RNA chaperone Hfq is a global post-transcriptional regulator in bacteria. Here, we used RNAseq to analyze RNA populations from the legume symbiont Sinorhizobium meliloti that were co-immunoprecipitated (CoIP-RNA) with a FLAG-tagged Hfq in five growth/stress conditions. Hfq-bound transcripts (1315) were largely identified in stressed bacteria and derived from small RNAs (sRNAs), both trans-encoded (6.4%) and antisense (asRNAs; 6.3%), and mRNAs (86%). Pull-down with Hfq recovered a small proportion of annotated S. meliloti sRNAs (14% of trans-sRNAs and 2% of asRNAs) suggesting a discrete impact of this protein in sRNA pathways. Nonetheless, Hfq selectively stabilized CoIP-enriched sRNAs, anticipating that these interactions are functionally significant. Transcription of 26 Hfq-bound sRNAs was predicted to occur from promoters recognized by the major stress σ factors σ(E2) or σ(H1/2). Recovery rates of sRNAs in each of the CoIP-RNA libraries suggest a large impact of Hfq-assisted riboregulation in S. meliloti osmoadaptation. Hfq directly targeted 18% of the predicted S. meliloti mRNAs, which encode functionally diverse proteins involved in transport and metabolism, σ(E2)-dependent stress responses, quorum sensing, flagella biosynthesis, ribosome, and membrane assembly or symbiotic nitrogen fixation. Canonical targeting of the 5' regions of two of the ABC transporter mRNAs by the homologous Hfq-binding AbcR1 and AbcR2 sRNAs leading to inhibition of protein synthesis was confirmed in vivo. We therefore provide a comprehensive resource for the systems-level deciphering of hitherto unexplored S. meliloti stress and symbiotic post-transcriptional regulons and the identification of Hfq-dependent sRNA-mRNA regulatory pairs.

Assuntos

Fator Proteico 1 do Hospedeiro/metabolismo , Processamento Pós-Transcricional do RNA , RNA Bacteriano/genética , RNA Bacteriano/metabolismo , Sinorhizobium meliloti/genética , Sinorhizobium meliloti/metabolismo , Estresse Fisiológico , Pareamento de Bases , Sítios de Ligação , Regulação Bacteriana da Expressão Gênica , Ligação Proteica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Reprodutibilidade dos Testes

8.

Global mapping of transcription start sites and promoter motifs in the symbiotic α-proteobacterium Sinorhizobium meliloti 1021.

Schlüter, Jan-Philip; Reinkensmeier, Jan; Barnett, Melanie J; Lang, Claus; Krol, Elizaveta; Giegerich, Robert; Long, Sharon R; Becker, Anke.

BMC Genomics ; 14: 156, 2013 Mar 07.

Artigo em Inglês | MEDLINE | ID: mdl-23497287

RESUMO

BACKGROUND: Sinorhizobium meliloti is a soil-dwelling α-proteobacterium that possesses a large, tripartite genome and engages in a nitrogen fixing symbiosis with its plant hosts. Although much is known about this important model organism, global characterization of genetic regulatory circuits has been hampered by a lack of information about transcription and promoters. RESULTS: Using an RNAseq approach and RNA populations representing 16 different growth and stress conditions, we comprehensively mapped S. meliloti transcription start sites (TSS). Our work identified 17,001 TSS that we grouped into six categories based on the genomic context of their transcripts: mRNA (4,430 TSS assigned to 2,657 protein-coding genes), leaderless mRNAs (171), putative mRNAs (425), internal sense transcripts (7,650), antisense RNA (3,720), and trans-encoded sRNAs (605). We used this TSS information to identify transcription factor binding sites and putative promoter sequences recognized by seven of the 15 known S. meliloti σ factors σ70, σ54, σH1, σH2, σE1, σE2, and σE9). Altogether, we predicted 2,770 new promoter sequences, including 1,302 located upstream of protein coding genes and 722 located upstream of antisense RNA or trans-encoded sRNA genes. To validate promoter predictions for targets of the general stress response σ factor, RpoE2 (σE2), we identified rpoE2-dependent genes using microarrays and confirmed TSS for a subset of these by 5' RACE mapping. CONCLUSIONS: By identifying TSS and promoters on a global scale, our work provides a firm foundation for the continued study of S. meliloti gene expression with relation to gene organization, σ factors and other transcription factors, and regulatory RNAs.

Assuntos

Genes Bacterianos , Sinorhizobium meliloti/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Sítios de Ligação , Mapeamento Cromossômico , Regiões Promotoras Genéticas , RNA/metabolismo , Análise de Sequência de RNA , Fator sigma/genética , Fator sigma/metabolismo , Sinorhizobium meliloti/metabolismo , Simbiose , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição

9.

Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.

El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter.

BMC Bioinformatics ; 13 Suppl 17: S22, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23281941

RESUMO

BACKGROUND: Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. RESULTS: In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. CONCLUSIONS: Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.

Assuntos

Biologia Computacional , Educação/métodos , Serviços de Informação , Internet , Software , Pesquisa

10.

Conveyor: a workflow engine for bioinformatic analyses.

Linke, Burkhard; Giegerich, Robert; Goesmann, Alexander.

Bioinformatics ; 27(7): 903-11, 2011 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-21278189

RESUMO

MOTIVATION: The rapidly increasing amounts of data available from new high-throughput methods have made data processing without automated pipelines infeasible. As was pointed out in several publications, integration of data and analytic resources into workflow systems provides a solution to this problem, simplifying the task of data analysis. Various applications for defining and running workflows in the field of bioinformatics have been proposed and published, e.g. Galaxy, Mobyle, Taverna, Pegasus or Kepler. One of the main aims of such workflow systems is to enable scientists to focus on analysing their datasets instead of taking care for data management, job management or monitoring the execution of computational tasks. The currently available workflow systems achieve this goal, but fundamentally differ in their way of executing workflows. RESULTS: We have developed the Conveyor software library, a multitiered generic workflow engine for composition, execution and monitoring of complex workflows. It features an open, extensible system architecture and concurrent program execution to exploit resources available on modern multicore CPU hardware. It offers the ability to build complex workflows with branches, loops and other control structures. Two example use cases illustrate the application of the versatile Conveyor engine to common bioinformatics problems. AVAILABILITY: The Conveyor application including client and server are available at http://conveyor.cebitec.uni-bielefeld.de.

Assuntos

Biologia Computacional , Software , Escherichia coli/genética , Genoma Bacteriano , Genômica , Anotação de Sequência Molecular , Fluxo de Trabalho

11.

Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction.

Janssen, Stefan; Schudoma, Christian; Steger, Gerhard; Giegerich, Robert.

BMC Bioinformatics ; 12: 429, 2011 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-22051375

RESUMO

BACKGROUND: Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis. RESULTS: We extract four different models of the thermodynamic folding space which underlie the programs RNAFOLD, RNASHAPES, and RNASUBOPT. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences. CONCLUSIONS: We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development.

Assuntos

Algoritmos , Dobramento de RNA , RNA/química , Sequência de Bases , Biologia Computacional , Probabilidade , Análise de Sequência de RNA , Termodinâmica

12.

Faster computation of exact RNA shape probabilities.

Janssen, Stefan; Giegerich, Robert.

Bioinformatics ; 26(5): 632-9, 2010 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-20080511

RESUMO

MOTIVATION: Abstract shape analysis allows efficient computation of a representative sample of low-energy foldings of an RNA molecule. More comprehensive information is obtained by computing shape probabilities, accumulating the Boltzmann probabilities of all structures within each abstract shape. Such information is superior to free energies because it is independent of sequence length and base composition. However, up to this point, computation of shape probabilities evaluates all shapes simultaneously and comes with a computation cost which is exponential in the length of the sequence. RESULTS: We device an approach called RapidShapes that computes the shapes above a specified probability threshold T by generating a list of promising shapes and constructing specialized folding programs for each shape to compute its share of Boltzmann probability. This aims at a heuristic improvement of runtime, while still computing exact probability values. CONCLUSION: Evaluating this approach and several substrategies, we find that only a small proportion of shapes have to be actually computed. For an RNA sequence of length 400, this leads, depending on the threshold, to a 10-138 fold speed-up compared with the previous complete method. Thus, probabilistic shape analysis has become feasible in medium-scale applications, such as the screening of RNA transcripts in a bacterial genome. AVAILABILITY: RapidShapes is available via http://bibiserv.cebitec.uni-bielefeld.de/rnashapes

Assuntos

Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Sequência de Bases , Bases de Dados Genéticas , Dados de Sequência Molecular , Análise de Sequência de RNA

13.

Fine-tuning structural RNA alignments in the twilight zone.

Bremges, Andreas; Schirmer, Stefanie; Giegerich, Robert.

BMC Bioinformatics ; 11: 222, 2010 Apr 30.

Artigo em Inglês | MEDLINE | ID: mdl-20433706

RESUMO

BACKGROUND: A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. RESULTS: Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. CONCLUSIONS: Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.

Assuntos

RNA/química , Alinhamento de Sequência , Análise de Sequência de RNA , Algoritmos , Sequência de Bases , Bases de Dados Factuais , Dados de Sequência Molecular , Conformação de Ácido Nucleico

14.

WAMI: a web server for the analysis of minisatellite maps.

Abouelhoda, Mohamed; El-Kalioby, Mohamed; Giegerich, Robert.

BMC Evol Biol ; 10: 167, 2010 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-20525398

RESUMO

BACKGROUND: Minisatellites are genomic loci composed of tandem arrays of short repetitive DNA segments. A minisatellite map is a sequence of symbols that represents the tandem repeat array such that the set of symbols is in one-to-one correspondence with the set of distinct repeats. Due to variations in repeat type and organization as well as copy number, the minisatellite maps have been widely used in forensic and population studies. In either domain, researchers need to compare the set of maps to each other, to build phylogenetic trees, to spot structural variations, and to study duplication dynamics. Efficient algorithms for these tasks are required to carry them out reliably and in reasonable time. RESULTS: In this paper we present WAMI, a web-server for the analysis of minisatellite maps. It performs the above mentioned computational tasks using efficient algorithms that take the model of map evolution into account. The WAMI interface is easy to use and the results of each analysis task are visualized. CONCLUSIONS: To the best of our knowledge, WAMI is the first server providing all these computational facilities to the minisatellite community. The WAMI web-interface and the source code of the underlying programs are available at http://www.nubios.nileu.edu.eg/tools/wami.

Assuntos

Biologia Computacional/métodos , Repetições Minissatélites , Análise de Sequência de DNA/métodos , Software , Algoritmos , Internet , Filogenia , Alinhamento de Sequência , Interface Usuário-Computador

15.

A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti.

Schlüter, Jan-Philip; Reinkensmeier, Jan; Daschkey, Svenja; Evguenieva-Hackenberg, Elena; Janssen, Stefan; Jänicke, Sebastian; Becker, Jörg D; Giegerich, Robert; Becker, Anke.

BMC Genomics ; 11: 245, 2010 Apr 17.

Artigo em Inglês | MEDLINE | ID: mdl-20398411

RESUMO

BACKGROUND: Small untranslated RNAs (sRNAs) are widespread regulators of gene expression in bacteria. This study reports on a comprehensive screen for sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti applying deep sequencing of cDNAs and microarray hybridizations. RESULTS: A total of 1,125 sRNA candidates that were classified as trans-encoded sRNAs (173), cis-encoded antisense sRNAs (117), mRNA leader transcripts (379), and sense sRNAs overlapping coding regions (456) were identified in a size range of 50 to 348 nucleotides. Among these were transcripts corresponding to 82 previously reported sRNA candidates. Enrichment for RNAs with primary 5'-ends prior to sequencing of cDNAs suggested transcriptional start sites corresponding to 466 predicted sRNA regions. The consensus sigma70 promoter motif CTTGAC-N17-CTATAT was found upstream of 101 sRNA candidates. Expression patterns derived from microarray hybridizations provided further information on conditions of expression of a number of sRNA candidates. Furthermore, GenBank, EMBL, DDBJ, PDB, and Rfam databases were searched for homologs of the sRNA candidates identified in this study. Searching Rfam family models with over 1,000 sRNA candidates, re-discovered only those sequences from S. meliloti already known and stored in Rfam, whereas BLAST searches suggested a number of homologs in related alpha-proteobacteria. CONCLUSIONS: The screening data suggests that in S. meliloti about 3% of the genes encode trans-encoded sRNAs and about 2% antisense transcripts. Thus, this first comprehensive screen for sRNAs applying deep sequencing in an alpha-proteobacterium shows that sRNAs also occur in high number in this group of bacteria.

Assuntos

Genoma Bacteriano , RNA Bacteriano/genética , RNA não Traduzido/genética , Sinorhizobium meliloti/genética , Elementos de DNA Transponíveis , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Análise de Sequência com Séries de Oligonucleotídeos , Sítio de Iniciação de Transcrição , Transcrição Gênica

16.

Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server.

Sczyrba, Alexander; Konermann, Susanne; Giegerich, Robert.

Brief Bioinform ; 9(3): 243-9, 2008 May.

Artigo em Inglês | MEDLINE | ID: mdl-18199576

RESUMO

Conferences in computational biology continue to provide tutorials on classical and new methods in the field. This can be taken as an indicator that education is still a bottleneck in our field's process of becoming an established scientific discipline. Bielefeld University has been one of the early providers of bioinformatics education, both locally and via the internet. The Bielefeld Bioinformatics Server (BiBiServ) offers a variety of older and new materials. Here, we report on two online courses made available recently, one introductory and one on the advanced level: (i) SADR: Sequence Analysis with Distributed Resources (http://bibiserv.techfak.uni-bielefeld.de/sadr/) and (ii) ADP: Algebraic Dynamic Programming in Bioinformatics (http://bibiserv.techfak.uni-bielefeld.de/dpcourse/).

Assuntos

Biologia Computacional/educação , Instrução por Computador/métodos , Currículo , Educação Profissionalizante/organização & administração , Genômica/educação , Ensino/métodos , Interface Usuário-Computador , Alemanha , Internet

17.

The BREW workshop series: a stimulating experience in PhD education.

Giegerich, Robert; Brazma, Alvis; Jonassen, Inge; Ukkonen, Esko; Vingron, Martin.

Brief Bioinform ; 9(3): 250-3, 2008 May.

Artigo em Inglês | MEDLINE | ID: mdl-18216087

RESUMO

Over recent years, five European PhD programmes have organized a series of 'Bioinformatics Research and Education Workshops'. These workshops address the needs of first-year PhD students and have been designed to combine a maximum of educational impact and scientific stimulation with a minimum of financial and administrative effort. We describe the BREW experience and argue that this type of event constitutes an attractive component of PhD education in computational biology and beyond.

Assuntos

Biologia Computacional/educação , Currículo , Educação de Pós-Graduação/organização & administração , Educação Profissionalizante/organização & administração , Genômica/educação , Ensino/métodos , Educação de Pós-Graduação/métodos , Europa (Continente)

18.

mkESA: enhanced suffix array construction tool.

Homann, Robert; Fleer, David; Giegerich, Robert; Rehmsmeier, Marc.

Bioinformatics ; 25(8): 1084-5, 2009 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-19246510

RESUMO

We introduce the tool mkESA, an open source program for constructing enhanced suffix arrays (ESAs), striving for low memory consumption, yet high practical speed. mkESA is a user-friendly program written in portable C99, based on a parallelized version of the Deep-Shallow suffix array construction algorithm, which is known for its high speed and small memory usage. The tool handles large FASTA files with multiple sequences, and computes suffix arrays and various additional tables, such as the LCP table (longest common prefix) or the inverse suffix array, from given sequence data.

Assuntos

Algoritmos , Biologia Computacional/métodos , Análise de Sequência/métodos , Software , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Análise de Sequência de RNA/métodos

19.

Significant speedup of database searches with HMMs by search space reduction with PSSM family models.

Beckstette, Michael; Homann, Robert; Giegerich, Robert; Kurtz, Stefan.

Bioinformatics ; 25(24): 3251-8, 2009 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-19828575

RESUMO

MOTIVATION: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive. RESULTS: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. We employ simpler models of protein families called position-specific scoring matrices family models (PSSM-FMs). For fast database search, we combine full-text indexing, efficient exact p-value computation of PSSM match scores and fast fragment chaining. The resulting method is well suited to prefilter the set of sequences to be searched for subsequent database searches with pHMMs. We achieved a classification performance only marginally inferior to hmmsearch, yet, results could be obtained in a fraction of runtime with a speedup of >64-fold. In experiments addressing the method's ability to prefilter the sequence space for subsequent database searches with pHMMs, our method reduces the number of sequences to be searched with hmmsearch to only 0.80% of all sequences. The filter is very fast and leads to a total speedup of factor 43 over the unfiltered search, while retaining >99.5% of the original results. In a lossless filter setup for hmmsearch on UniProtKB/Swiss-Prot, we observed a speedup of factor 92. AVAILABILITY: The presented algorithms are implemented in the program PoSSuMsearch2, available for download at http://bibiserv.techfak.uni-bielefeld.de/possumsearch2/. CONTACT: beckstette@zbh.uni-hamburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Proteínas , Cadeias de Markov , Matrizes de Pontuação de Posição Específica , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Software

20.

KnotInFrame: prediction of -1 ribosomal frameshift events.

Theis, Corinna; Reeder, Jens; Giegerich, Robert.

Nucleic Acids Res ; 36(18): 6013-20, 2008 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-18820303

RESUMO

Programmed -1 ribosomal frameshift (-1 PRF) allows for alternative reading frames within one mRNA. First found in several viruses, it is now believed to exist in all kingdoms of life. Strong stimulators for -1 PRF are a heptameric slippery site and an RNA pseudoknot. Here, we present a new algorithm KnotInFrame, for the automatic detection of -1 PRF signals from genomic sequences. It finds the frameshifting stimulators by means of a specialized RNA-pseudoknot folding program, fast enough for genome-wide analyses. Evaluations on known -1 PRF signals demonstrate a high sensitivity.

Assuntos

Algoritmos , Mudança da Fase de Leitura do Gene Ribossômico , Software , Sequência de Bases , Biologia Computacional , Sequência Consenso , Bases de Dados de Ácidos Nucleicos , Genômica , Saccharomyces cerevisiae/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA