RESUMO
All commonly used stochastic optimisation algorithms have to be parameterised to perform effectively. Adaptive parameter control (APC) is an effective method used for this purpose. APC repeatedly adjusts parameter values during the optimisation process for optimal algorithm performance. The assignment of parameter values for a given iteration is based on previously measured performance. In recent research, time series prediction has been proposed as a method of projecting the probabilities to use for parameter value selection. In this work, we examine the suitability of a variety of prediction methods for the projection of future parameter performance based on previous data. All considered prediction methods have assumptions the time series data has to conform to for the prediction method to provide accurate projections. Looking specifically at parameters of evolutionary algorithms (EAs), we find that all standard EA parameters with the exception of population size conform largely to the assumptions made by the considered prediction methods. Evaluating the performance of these prediction methods, we find that linear regression provides the best results by a very small and statistically insignificant margin. Regardless of the prediction method, predictive parameter control outperforms state of the art parameter control methods when the performance data adheres to the assumptions made by the prediction method. When a parameter's performance data does not adhere to the assumptions made by the forecasting method, the use of prediction does not have a notable adverse impact on the algorithm's performance.
Assuntos
Algoritmos , Metodologias Computacionais , Previsões/métodos , Modelos Teóricos , Modelos Lineares , Processos Estocásticos , Fatores de TempoRESUMO
Today's scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters. Much research effort is devoted to the tuning and performance optimization of specific workflows for specific clusters. However, an arguably even more important problem for accelerating research is the reduction of development, adaptation, and maintenance times of DAWs. We describe the design and setup of the Collaborative Research Center (CRC) 1404 "FONDA -- Foundations of Workflows for Large-Scale Scientific Data Analysis", in which roughly 50 researchers jointly investigate new technologies, algorithms, and models to increase the portability, adaptability, and dependability of DAWs executed over distributed infrastructures. We describe the motivation behind our project, explain its underlying core concepts, introduce FONDA's internal structure, and sketch our vision for the future of workflow-based scientific data analysis. We also describe some lessons learned during the "making of" a CRC in Computer Science with strong interdisciplinary components, with the aim to foster similar endeavors.