Publication
FAST 2023
Poster

AIOps Essential to Unified Resiliency Management in Data Lakehouses

Abstract

AIOps can provide essential value for data lakehouses as lakehouses pose complex operational challenges for Site Reliability Engineers (SRE). This paper proposes that the unified approach of data lakehouses creates a unique opportunity for unified data resiliency management. We focus on AIOps applied to disaster recovery and backup/restore. In particular, we focus on managing data lakehouse hardware resources to ensure that lakehouse data Recovery Point Objectives (RPO) are met with a high degree of accuracy. The goal is to warn an SRE about an impending RPO violation and to suggest adding given amounts of hardware resources before a given time to avoid violation of the lakehouse data's RPO. We claim AIOps can achieve this goal with an ensemble of machine learning and time series analysis.

Date

22 Feb 2022

Publication

FAST 2023