Real-time Statistical Log Anomaly Detection with Continuous AIOps Learning
Abstract
Anomaly detection from logs is a fundamental Information Technology Operations (ITOps) management task. It aims to detect anomalous system behaviours and find signals that can provide clues to the reasons and the anatomy of a system’s failure. Applying advanced, explainable Artificial Intelligence (AI) models throughout the entire ITOps is critical to confidently assess, diagnose and resolve such system failures. In this paper, we describe a new online log anomaly detection algorithm which helps significantly reduce the time-to-value of Log Anomaly Detection. This algorithm is able to continuously update the Log Anomaly Detection model at run-time and automatically avoid potential biased model caused by contaminated log data. The methods described here have shown 60% improvement on average F1-scores from experiments for multiple datasets comparing to the existing method in the product pipeline, which demonstrates the efficacy of our proposed methods.