About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICPE 2024
Workshop paper
Baking Disaster-Proof Kubernetes Applications with Efficient Recipes
Abstract
Multicluster disaster recovery on cloud-native platforms such as Kubernetes usually replicates application data and Kubernetes resources to a safe recovery cluster. In the event of a disaster, Kubernetes resources are restored to the recovery cluster to recover the affected applications. We tested 10 popular Kubernetes applications using this naive approach, and 60% failed. Problems include data being restored in the wrong order, cluster-specific data being restored instead of generated by the cluster, etc. All these problems lead to our recipe design that enables disaster recovery of all Kubernetes applications. In this paper, we analyze the problems we encountered during the disaster recovery of Kubernetes applications and categorize applications based on their disaster recovery behaviors. We present a recipe that groups, orders, and filters Kubernetes resources to enable disaster recovery. Finally, we evaluate the reliability and efficiency of the recipe. Our evaluation shows that recipe achieves a 100% success rate of disaster recovery while adding mere seconds of overhead to the recovery time.