Publication
CLOUD 2015
Conference paper

Remote Restart for a High Performance Virtual Machine Recovery in a Cloud

View publication

Abstract

In this paper, we present a scalable parallel virtual machine planning and fail over method that enables high availability at a VM level in a data center. The solution is implemented and used in IBM's CMS enterprise private cloud as a high availability feature for efficient fail over in large data centers with a large number of servers, VMs, and a large number of disks. The introduced restart system enables dynamic and at-fail over-time planning and execution, and keeps the recovery time within limits of service level agreement (SLA) allowed time budget. The initial serial fail over time is reduced by a factor of up to 11 for parallel implementation, and by a factor of up to 44 for parallel fail over - parallel storage mapping implementation. As part of our future work, we plan to explore the applicability of this planning and fail over solution for Disaster Recovery.

Date

Publication

CLOUD 2015

Authors

Share