ICSI: A cloud garbage VM collector for addressing inactive VMs with machine learning
According to a recent study, 30% of VMs in private cloud data centers are 'comatose', in part because there is generally no strong incentive for their human owners to delete them at an appropriate time. These inactive VMs are still scheduled and executed on physical cloud resources, taking valuable access away from productive VMs. In an extreme, cloud infrastructure may deny legitimate requests for new VMs because capacity limits have been hit. It is not sufficient for cloud infrastructure to identify such inactive VMs by monitoring resource utilization (e.g., CPU utilization) - e.g., management processes (e.g. virus-scan, software update) on inactive VMs often consume high CPU and memory resources, and active VMs with lightweight jobs (e.g. text editing) show almost zero resource utilization. To properly detect and address such inactive VMs, we present iCSI: a cloud garbage VM collector to improve resource utilization and cost efficiency of enterprise data centers. iCSI includes three main components, a lightweight data collector, a VM identification model and a recommendation engine. The data collector periodically gathers primitive information from VMs. The identification model infers the purpose of a VM from the data collection and extracts the most relevant features associated with the purpose. The recommendation engine offers proper actions to end users i.e., suspending or resizing VMs. In this prototype phase, iCSI is deployed into multiple data centers in IBM and manages more than 750 production VMs. iCSI achieves 20% better accuracy (90%) in identifying active/inactive VMs compared with state-of-the-art methods. With recommendations to end users, our estimation results show that iCSI can improve internal cost efficiency with 23% and resource utilization more than 45%.