RESUMO
BACKGROUND: Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community. RESULTS: We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. CONCLUSIONS: Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
Assuntos
Biologia Computacional/métodos , Redes de Comunicação de Computadores , Microcomputadores , Software , Fluxo de Trabalho , Reprodutibilidade dos TestesRESUMO
MOTIVATION: Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. RESULTS: In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large number of molecular data structures and algorithms implemented in BALL allows for elegant and sophisticated development of new approaches in the field. We hence connected the versatile BALL library and its visualization and editing front end BALLView with the Galaxy workflow framework. The result, which we call ballaxy, enables the user to simply and intuitively create sophisticated pipelines for applications in structure-based computational biology, integrated into a standard tool for molecular modelling. AVAILABILITY AND IMPLEMENTATION: ballaxy consists of three parts: some minor modifications to the Galaxy system, a collection of tools and an integration into the BALL framework and the BALLView application for molecular modelling. Modifications to Galaxy will be submitted to the Galaxy project, and the BALL and BALLView integrations will be integrated in the next major BALL release. After acceptance of the modifications into the Galaxy project, we will publish all ballaxy tools via the Galaxy toolshed. In the meantime, all three components are available from http://www.ball-project.org/ballaxy. Also, docker images for ballaxy are available at https://registry.hub.docker.com/u/anhi/ballaxy/dockerfile/. ballaxy is licensed under the terms of the GPL.
Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Software , Humanos , Modelos Moleculares , Integração de Sistemas , Interface Usuário-Computador , Fluxo de TrabalhoRESUMO
The products of many bacterial non-ribosomal peptide synthetases (NRPS) are highly important secondary metabolites, including vancomycin and other antibiotics. The ability to predict substrate specificity of newly detected NRPS Adenylation (A-) domains by genome sequencing efforts is of great importance to identify and annotate new gene clusters that produce secondary metabolites. Prediction of A-domain specificity based on the sequence alone can be achieved through sequence signatures or, more accurately, through machine learning methods. We present an improved predictor, based on previous work (NRPSpredictor), that predicts A-domain specificity using Support Vector Machines on four hierarchical levels, ranging from gross physicochemical properties of an A-domain's substrates down to single amino acid substrates. The three more general levels are predicted with an F-measure better than 0.89 and the most detailed level with an average F-measure of 0.80. We also modeled the applicability domain of our predictor to estimate for new A-domains whether they lie in the applicability domain. Finally, since there are also NRPS that play an important role in natural products chemistry of fungi, such as peptaibols and cephalosporins, we added a predictor for fungal A-domains, which predicts gross physicochemical properties with an F-measure of 0.84. The service is available at http://nrps.informatik.uni-tuebingen.de/.
Assuntos
Peptídeo Sintases/química , Software , Inteligência Artificial , Domínio Catalítico , Internet , Especificidade por SubstratoRESUMO
In many organisms, aconitases have dual functions; they serve as enzymes in the tricarboxylic acid cycle and as regulators of iron metabolism. In this study we defined the role of the aconitase AcnA in Streptomyces viridochromogenes Tü494, the producer of the herbicide phosphinothricyl-alanyl-alanine, also known as phosphinothricin tripeptide or bialaphos. A mutant in which the aconitase gene acnA was disrupted showed severe defects in morphology and physiology, as it was unable to form any aerial mycelium, spores nor phosphinothricin tripeptide. AcnA belongs to the iron regulatory proteins (IRPs). In addition to its catalytic function, AcnA plays a regulatory role by binding to iron responsive elements (IREs) located on the untranslated region of certain mRNAs. A mutation preventing the formation of the [4Fe-4S] cluster of AcnA eliminated its catalytic activity, but did not inhibit RNA-binding ability. In silico analysis of the S. viridochromogenes genome revealed several IRE-like structures. One structure is located upstream of recA, which is involved in the bacterial SOS response, and another one was identified upstream of ftsZ, which is required for the onset of sporulation in streptomycetes. The functionality of different IRE structures was proven with gel shift assays and specific IRE consensus sequences were defined. Furthermore, RecA was shown to be upregulated on post-transcriptional level under oxidative stress conditions in the wild-type strain but not in the acnA mutant, suggesting a regulatory role of AcnA in oxidative stress response.
Assuntos
Aconitato Hidratase/genética , Aconitato Hidratase/metabolismo , Streptomyces/enzimologia , Aconitato Hidratase/química , Sequência de Aminoácidos , Proteínas de Bactérias/metabolismo , Catálise , Ciclo do Ácido Cítrico , Proteínas de Ligação a DNA/metabolismo , Concentração de Íons de Hidrogênio , Proteínas Reguladoras de Ferro/metabolismo , Mutação , Estresse Oxidativo/genética , Fenótipo , Proteínas de Ligação a RNA/metabolismo , Recombinases Rec A/metabolismo , Regulação para CimaRESUMO
An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/.
Assuntos
Biologia Computacional/métodos , Enzimas/química , Enzimas/metabolismo , Algoritmos , Sequência de Aminoácidos , Inteligência Artificial , Domínio Catalítico , Bases de Dados de Proteínas , Enzimas/genética , Internet , Modelos Moleculares , Conformação Proteica , Relação Estrutura-Atividade , Especificidade por SubstratoRESUMO
In this study we report on the specificity profiling of the MAP kinase inhibitors 1, 2, and 3 in a panel of 78 protein kinases including the MAPK isoforms p38(alpha,beta,gamma,delta), JNK1/2/3, and ERK1/2/8 showing 3-(4-fluorophenyl)-4-pyridin-4-ylquinolin-2(1H)-one (1) to be highly selective for p38alphaMAPK with an IC(50) of 1.8 microM. In contrast, besides p38alpha the isoxazoles 2 and 3 significantly inhibited JNK2/3 and further kinases beyond the MAPK family such as PKA, PKD, Lck, and CK1. By using sequence alignment and homology models of different members of the MAPK family the binding mode determining selectivity of 1 for the p38alpha isoform was investigated. For lead optimization of 1 a straightforward tandem-Buchwald-aldol synthetic approach toward the flexible decoration of the quinolin-2(1H)-one scaffold was employed. SAR for derivatives of 1 at the isolated p38alphaMAPK are presented.