About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
NSDI 2023
Poster
DEFT: SLO-Driven Preemptive Scheduling for Containerized DNN Serving
Abstract
With GPU servers increasingly shared by containerized DNNs that have highly diverse SLOs of inference delay, we observe an emerging need for a scheduler that, without changing container applications, can dynamically estimate the remaining time of each DNN job, in order to determine which kernel calls should preempt the incumbent DNN inference on a shared GPU. This project presents such a scheduler on top of Kubernetes called DEFT. Our preliminary results show that compared to existing solutions, \name reduces SLO violations, because (1) it allows preempting a DNN inference in kernel-level rather than treating DNN inference as a whole, and (2) it makes preemption decisions based on the remaining time of each competing DNN job, rather than static weight per DNN job or the duration of individual kernel calls.