RESUMO
BACKGROUND: The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. RESULTS: The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. CONCLUSIONS: The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
Assuntos
Biologia Computacional/métodos , Biologia de Sistemas/métodos , Animais , Células CHO , Cricetulus/genética , Drosophila melanogaster/genética , Escherichia coli/genética , Regulação da Expressão Gênica , Modelos Teóricos , Dinâmica não Linear , Saccharomyces cerevisiae/genética , Transdução de Sinais , Transcrição GênicaRESUMO
UNLABELLED: We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. AVAILABILITY: ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uvigo.es/software/prottest3, linked to a Mercurial repository at Bitbucket (https://bitbucket.org/). CONTACT: dposada@uvigo.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Evolução Molecular , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína , Software , Modelos Estatísticos , FilogeniaRESUMO
Weed scientists are usually interested in the study of the distribution and density functions of the random variable that relates weed emergence with environmental indices like the hydrothermal time (HTT). However, in many situations, experimental data are presented in a grouped way and, therefore, the standard nonparametric kernel estimators cannot be computed.Kernel estimators for the density and distribution functions for interval-grouped data, as well as bootstrap confidence bands for these functions, have been proposed and implemented in the binnednp package. Analysis with different treatments can also be performed using a bootstrap approach and a Cramér-von Mises type distance. Several bandwidth selection procedures were also implemented. This package also allows to estimate different emergence indices that measure the shape of the data distribution. The values of these indices are useful for the selection of the soil depth at which HTT should be measured which, in turn, would maximize the predictive power of the proposed methods.This paper presents the functions of the package and provides an example using an emergence data set of Avena sterilis (wild oat).The binnednp package provides investigators with a unique set of tools allowing the weed science research community to analyze interval-grouped data.
RESUMO
BACKGROUND: We consider a general class of global optimization problems dealing with nonlinear dynamic models. Although this class is relevant to many areas of science and engineering, here we are interested in applying this framework to the reverse engineering problem in computational systems biology, which yields very large mixed-integer dynamic optimization (MIDO) problems. In particular, we consider the framework of logic-based ordinary differential equations (ODEs). METHODS: We present saCeSS2, a parallel method for the solution of this class of problems. This method is based on an parallel cooperative scatter search metaheuristic, with new mechanisms of self-adaptation and specific extensions to handle large mixed-integer problems. We have paid special attention to the avoidance of convergence stagnation using adaptive cooperation strategies tailored to this class of problems. RESULTS: We illustrate its performance with a set of three very challenging case studies from the domain of dynamic modelling of cell signaling. The simpler case study considers a synthetic signaling pathway and has 84 continuous and 34 binary decision variables. A second case study considers the dynamic modeling of signaling in liver cancer using high-throughput data, and has 135 continuous and 109 binaries decision variables. The third case study is an extremely difficult problem related with breast cancer, involving 690 continuous and 138 binary decision variables. We report computational results obtained in different infrastructures, including a local cluster, a large supercomputer and a public cloud platform. Interestingly, the results show how the cooperation of individual parallel searches modifies the systemic properties of the sequential algorithm, achieving superlinear speedups compared to an individual search (e.g. speedups of 15 with 10 cores), and significantly improving (above a 60%) the performance with respect to a non-cooperative parallel scheme. The scalability of the method is also good (tests were performed using up to 300 cores). CONCLUSIONS: These results demonstrate that saCeSS2 can be used to successfully reverse engineer large dynamic models of complex biological pathways. Further, these results open up new possibilities for other MIDO-based large-scale applications in the life sciences such as metabolic engineering, synthetic biology, drug scheduling.