Providing Resiliency to Orchestration and Automation Engines in Hybrid Cloud
Hybrid cloud environments have seen a rapid rise in recent years. An essential part of a hybrid cloud is its ability to orchestrate the allocation, provisioning, and management of different compute resources spanning multiple cloud systems, and drive these operations across multiple cloud systems in an automated way. The Orchestration and Automation Engines (OAEs) of a hybrid cloud must themselves be highly available for ensuring high resiliency of the hybrid cloud. We present our experience in providing resiliency to the OAEs of a real-world hybrid cloud in this paper. The presentation includes the resiliency architecture of the OAEs, solutions that deal with errors ranging from software component crash to configuration/metadata error and data corruption, experimental results and our lessons learned from the practical experience.