Caravel: Burst tolerant scheduling for containerized stateful applications
In a containerized environment, the applications are generally categorized as either stateful or stateless, each consisting of multiple containers. Their co-scheduling in a cluster presents unique challenges for the container orchestration frameworks. The two types of applications differ from each other in how they handle the temporary load spikes. The stateless applications can scale out by instantly spawning new identical instances, whereas the stateful applications require deliberate planning to scale, as each application instance is unique. Instead, the stateful applications can more conveniently acquire the additional resources needed during a spike by scaling up on the same node. However, when an application's container uses more than requested resources, it risks being evicted from the node. The evictions are particularly detrimental for stateful applications because of their longer start up time and the resulting degradation. Moreover, the existing container orchestration frameworks schedule or evict containers without any knowledge of its impact on their owning applications. For instance, an eviction of the application's multiple containers in a short period of time could compromise its availability and severely degrade its performance. To address these challenges, we present Caravel, a scheduling approach that provides better experience to stateful applications in dealing with load spikes. It allows them to overstep the resource request during a burst and use the resources on the same node while minimizing their evictions. Moreover, the scheduler provides a fair opportunity to all the stateful applications to use the spare resources in the cluster. The evaluation shows that our approach reduces the eviction of stateful applications by up to 90% over the traditional approach.