Publication
CLOUD 2021
Conference paper

A system for proactive risk assessment of application changes in cloud operations

View publication

Abstract

Change is one of the biggest contributors to service outages. With more enterprises migrating their applications to cloud and using automated build and deployment the volume and rate of changes has significantly increased. Furthermore, microservice-based architectures have reduced the turnaround time for changes and increased the dependency between services. All of the above make it impossible for the Site Reliability Engineers (SREs) to use the traditional methods of manual risk assessment for changes. In order to mitigate change-induced service failures and ensure continuous improvement for cloud native services, it is critical to have an automated system for assessing the risk of change deployments. In this paper, we present an AI-based system for proactively assessing the risk associated with deployment of application changes in cloud operations. The risk assessment is accompanied with actionable risk explainability. We discuss the usage of this system in two primary scenarios of automated and manual deployment.

Date

05 Sep 2021

Publication

CLOUD 2021