Guaranteeing Performance SLAs of Cloud Applications Under Resource Storms
In modern data centers, enterprise cloud instances run not only foreground applications like web and databases, but also different background services (e.g., backup, virus/compliance scan, batch) to manage the cloud instances securely and improve the overall resource utilization. These background services often incur resource storms that suddenly consume a lot of shared resources on cloud instances. The resource storms significantly degrade the performance of foreground applications by interfering in the preemption of the shared resources, resulting in frequent SLA violations. However, stock OS schedulers are not designed to handle these situations, and prior works are insufficient to address such resource storms under highly dynamic cloud workloads. This article presents Orchestra, a cloud-specific framework for controlling multiple applications in the user space, aiming at meeting corresponding SLAs. Orchestra takes an online approach with lightweight monitoring and performance models for both applications on the fly. It optimizes the resource allocations to meet corresponding SLAs. We evaluate the performance of Orchestra on a production cloud with a diverse range of SLAs. Orchestra guarantees the foreground application's performance SLAs at all times. At the same time, Orchestra maintains the background's performance by minimizing its performance penalty with proper allocation of the shared resources.