Pesquisa | Portal de Pesquisa da BVS Enfermagem

Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B.

BMC Bioinformatics ; 9: 334, 2008 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-18687127

RESUMO

BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.

Assuntos

Interpretação Estatística de Dados , Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Software , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação , Linguagens de Programação

Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud.

Flanagan, Keith; Nakjang, Sirintra; Hallinan, Jennifer; Harwood, Colin; Hirt, Robert P; Pocock, Matthew R; Wipat, Anil.

J Integr Bioinform ; 9(2): 212, 2012 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-23001322

RESUMO

As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

Assuntos

Biologia Computacional/métodos , Software , Bases de Dados Factuais , Interface Usuário-Computador , Fluxo de Trabalho

Taverna: a tool for the composition and enactment of bioinformatics workflows.

Oinn, Tom; Addis, Matthew; Ferris, Justin; Marvin, Darren; Senger, Martin; Greenwood, Mark; Carver, Tim; Glover, Kevin; Pocock, Matthew R; Wipat, Anil; Li, Peter.

Bioinformatics ; 20(17): 3045-54, 2004 Nov 22.

Artigo em Inglês | MEDLINE | ID: mdl-15201187

RESUMO

MOTIVATION: In silico experiments in bioinformatics involve the co-ordinated use of computational tools and information repositories. A growing number of these resources are being made available with programmatic access in the form of Web services. Bioinformatics scientists will need to orchestrate these Web services in workflows as part of their analyses. RESULTS: The Taverna project has developed a tool for the composition and enactment of bioinformatics workflows for the life sciences community. The tool includes a workbench application which provides a graphical user interface for the composition of workflows. These workflows are written in a new language called the simple conceptual unified flow language (Scufl), where by each step within a workflow represents one atomic task. Two examples are used to illustrate the ease by which in silico experiments can be represented as Scufl workflows using the workbench application.

Assuntos

Biologia Computacional/métodos , Gráficos por Computador , Armazenamento e Recuperação da Informação/métodos , Internet , Sistemas On-Line , Software , Interface Usuário-Computador , Redes de Comunicação de Computadores , Sistemas de Gerenciamento de Base de Dados , Design de Software

The Bioperl toolkit: Perl modules for the life sciences.

Stajich, Jason E; Block, David; Boulez, Kris; Brenner, Steven E; Chervitz, Stephen A; Dagdigian, Chris; Fuellen, Georg; Gilbert, James G R; Korf, Ian; Lapp, Hilmar; Lehväslaiho, Heikki; Matsalla, Chad; Mungall, Chris J; Osborne, Brian I; Pocock, Matthew R; Schattner, Peter; Senger, Martin; Stein, Lincoln D; Stupka, Elia; Wilkinson, Mark D; Birney, Ewan.

Genome Res ; 12(10): 1611-8, 2002 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-12368254

RESUMO

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

Assuntos

Disciplinas das Ciências Biológicas/métodos , Biologia Computacional/métodos , Algoritmos , Animais , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Internet , Sistemas On-Line , Software , Design de Software , Integração de Sistemas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA