PerfInsight: A Robust Clustering-Based Abnormal Behavior Detection System for Large-Scale Cloud
Anomalous behaviors of cloud services usually lead to performance degradation or even unplanned outages, which dramatically harms their Quality of Services. Performance monitoring and anomaly detection systems have been widely applied to mitigate these risks. However, huge volume of collected data, prevalence of trends and noises in data distribution, lack of labelled anomalies and unpredictability of various types of anomalies bring great challenges to existing anomaly detection systems in real world. Recently, unsupervised clustering-based anomaly detection approaches become promising solutions due to less dependency on labelled data and adaption to various types of anomalies. To achieve better quality with clustering-based anomaly detection approaches, huge amount of data normalization work is required. In this paper, we present a practical robust anomaly detection system for large-scale cloud called PerfInsight. First, it detects potential trends from these collected data and automatically transforms them to reduce their negative impact to clustering results. Then, an entropy-based feature selection of transformed metrics is designed to improve the detection efficiency. Finally, more robust clustering models can be trained and used based on these well transformed and selected features. Our evaluation results prove that PerfInsight could significantly reduce the cardinality of models.