Publication
ICSE 2022
Conference paper

Utilizing Persistence for Post Facto Suppression of Invalid Anomalies Using System Logs

View publication

Abstract

The robustness and availability of cloud services are becoming increasingly important as more applications migrate to the cloud. The operations landscape today is more complex than ever. Site Reliability Engineers(SREs) are expected to handle more incidents than ever before with shorter service-level agreements (SLAs). By exploiting log, tracing, metrics, and network data, Artificial Intelligence for IT Operations (AIOps) enables the detection of faults and anomalous issues of services. A wide variety of anomaly detection techniques have been incorporated in various AIOps platforms (e.g. PCA and auto-encoder), but they all suffer from false positives. In this paper, we propose an unsupervised approach for persistent anomaly detection on top of the traditional anomaly detection approaches, with the goal of reducing false positives and providing more trustworthy alerting signals. We test our method on both simulated and real-world datasets. Our technique reduces false-positive anomalies by at least 28%, resulting in more reliable and trustworthy notifications.

Date

23 May 2022

Publication

ICSE 2022

Share