Resource As You Wish: Collaborative Reservation and Allocation by Scheduler Plugin and Device Plugin
Abstract
Kubernetes encapsulates the details of infrastructure so that user can describe their desired state as reusable manifest. However, such nature prevents us from maximizing utilization of peripheral resources because hardware topology heavily impacts on the performance of computation workloads. For example, inter-device and inter-node communication acceleration technologies like Direct Memory Access (DMA) become unavailable if multiple GPUs under different PCI Express Bridge are allocated to a single set of AI workloads. This presentation shows a concrete usecase of Kubernetes Scheduling Framework. Scheduler plugins, which leverages the framework, jointly works with device plugin to enable users to allocate their preferred AI hardware devices to computing workloads. In addition to utilizing existing stable technologies, the mechanism with Dynamic Resource Allocation (DRA), a promising technology for device management in Kubernetes, will also be included.