RESUMO
In recent years, Kubernetes (K8s) has become a dominant resource management and scheduling system in the cloud. In practical scenarios, short-running cloud workloads are usually scheduled through different scheduling algorithms provided by Kubernetes. For example, artificial intelligence (AI) workloads are scheduled through different Volcano scheduling algorithms, such as GANG_MRP, GANG_LRP, and GANG_BRA. One key challenge is that the selection of scheduling algorithms has considerable impacts on job performance results. However, it takes a prohibitively long time to select the optimal algorithm because applying one algorithm in one single job may take a few minutes to complete. This poses the urgent requirement of a simulator that can quickly evaluate the performance impacts of different algorithms, while also considering scheduling-related factors, such as cluster resources, job structures and scheduler configurations. In this paper, we design and implement a Kubernetes simulator called K8sSim, which incorporates typical Kubernetes and Volcano scheduling algorithms for both generic and AI workloads, and provides an accurate simulation of their scheduling process in real clusters. We use real cluster traces from Alibaba to evaluate the effectiveness of K8sSim, and the evaluation results show that (i) compared to the real cluster, K8sSim can accurately evaluate the performance of different scheduling algorithms with similar CloseRate (a novel metric we define to intuitively show the simulation accuracy), and (ii) it can also quickly obtain the scheduling results of different scheduling algorithms by accelerating the scheduling time by an average of 38.56×.