RESUMO
BACKGROUND: Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. RESULTS: We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. AVAILABILITY: http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). CONCLUSION: From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Assuntos
Biologia Computacional/métodos , Algoritmos , Inteligência Artificial , Redes de Comunicação de Computadores , Gráficos por Computador , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação , Modelos Estatísticos , Modelos Teóricos , Processamento de Linguagem Natural , Sistemas On-Line , Linguagens de Programação , Alinhamento de Sequência , Análise de Sequência de DNA , Software , Design de Software , Fatores de Tempo , Interface Usuário-ComputadorRESUMO
In the wild tomato Solanum habrochaites, the Sst2 locus on chromosome 8 is responsible for the biosynthesis of several class II sesquiterpene olefins by glandular trichomes. Analysis of a trichome-specific EST collection from S. habrochaites revealed two candidate genes for the synthesis of Sst2-associated sesquiterpenes. zFPS encodes a protein with homology to Z-isoprenyl pyrophosphate synthases and SBS (for Santalene and Bergamotene Synthase) encodes a terpene synthase with homology to kaurene synthases. Both genes were found to cosegregate with the Sst2 locus. Recombinant zFPS protein catalyzed the synthesis of Z,Z-FPP from isopentenylpyrophosphate (IPP) and dimethylallylpyrophosphate (DMAPP), while coincubation of zFPS and SBS with the same substrates yielded a mixture of olefins identical to the Sst2-associated sesquiterpenes, including (+)-alpha-santalene, (+)-endo-beta-bergamotene, and (-)-endo-alpha-bergamotene. In addition, headspace analysis of tobacco (Nicotiana sylvestris) plants expressing zFPS and SBS in glandular trichomes afforded the same mix of sesquiterpenes. Each of these proteins contains a putative plastid targeting sequence that mediates transport of a fused green fluorescent protein to the chloroplasts, suggesting that the biosynthesis of these sesquiterpenes uses IPP and DMAPP from the plastidic DXP pathway. These results provide novel insights into sesquiterpene biosynthesis and have general implications concerning sesquiterpene engineering in plants.