TaskInsight: A fine-grained performace anomaly detection and problem locating system
Abstract
Performance anomaly is one of the key issues to threaten the Quality of Service of the applications in cloud. However, the complexities and consolidation of application, the fluctuation of workload or other factors make the anomaly detection a burdensome task. Traditional solutions exploit the metrics at system level to detect the anomaly. However, system level metrics are too coarse to remedy the fluctuation and locate the real root cause. In this paper, we present a system called TaskInsight, which could detect performance anomaly and locate the problem at a fine-grained task level in a black-box fashion. It leverages unsupervised clustering approach cooperated with the multitasking type to induce the normal resource usage behavior patterns from historical data. The evaluation result shows that TaskInsight can detect the common performance anomaly precisely. Furthermore, it could easily locate the malicious or anomalous task. In multi-process condition, the detection precision could reach 90% on average.