Publication
ADAC 2022
Invited talk

KubeFlux: A scheduler plugin bridging the cloud-HPC gap in Kubernetes

View publication

Abstract

The cloud is an increasingly important market sector of computing and is driving innovation. Adoption of cloud technologies by high performance computing (HPC) is accelerating, and HPC users want their applications to perform well everywhere. While cloud orchestration frameworks like Kubernetes provide advantages like resiliency, elasticity, and automation, they are not designed to enable application performance to the same degree as HPC workload managers and schedulers. As HPC and cloud Computing converge, techniques from HPC can be integrated into the cloud to improve application performance and provide universal scalability. We present KubeFlux, a Kubernetes plugin based on the Fluxion open-source HPC scheduler component of the Flux framework developed at the Lawrence Livermore National Laboratory. We introduce the Flux framework and the Fluxion scheduler and describe how their hierarchical, graph-based foundation is naturally suited to converged computing. We discuss uses for KubeFlux and compare the performance of an application scheduled by the Kubernetes default scheduler and KubeFlux. KubeFlux is an example of the rich capability that can be added to Kubernetes and paves the way to democratization of the cloud for HPC workloads.