Remote Restart for a High Performance Virtual Machine Recovery in a Cloud

Valentina Salapura; Richard Harper

doi:10.1109/CLOUD.2015.52

CLOUD 2015

Conference paper

19 Aug 2015

Remote Restart for a High Performance Virtual Machine Recovery in a Cloud

View publication

Abstract

In this paper, we present a scalable parallel virtual machine planning and fail over method that enables high availability at a VM level in a data center. The solution is implemented and used in IBM's CMS enterprise private cloud as a high availability feature for efficient fail over in large data centers with a large number of servers, VMs, and a large number of disks. The introduced restart system enables dynamic and at-fail over-time planning and execution, and keeps the recovery time within limits of service level agreement (SLA) allowed time budget. The initial serial fail over time is reduced by a factor of up to 11 for parallel implementation, and by a factor of up to 44 for parallel fail over - parallel storage mapping implementation. As part of our future work, we plan to explore the applicability of this planning and fail over solution for Disaster Recovery.

Conference paper