Publication
IBM J. Res. Dev
Review

IT troubleshooting with drift analysis in the DevOps era

View publication

Abstract

Over the past few years, DevOps practices have led to many changes in the software industry. The need for agility has resulted in continuous development and deployment of frequent small updates in IT production systems. However, the ever-changing applications and their IT operations environments challenge existing IT troubleshooting approaches, which generally depend on prebuilt domain knowledge and ignore the frequent changes in the DevOps era. Moreover, the complexity and diversity of application architectures exacerbate the challenges. In this paper, we propose an unsupervised learning based drift analysis tool named CHASER to detect and analyze abnormal changes (referred to as 'drifts,' which include configuration errors, processes hanging, etc.), with learned change models and patterns in real time as well as in the root cause analysis. First, we categorize the changes into two distinct groups (static and dynamic state changes) and periodically collect the finer grained changes. Then, we extract the time-series and structural features from these changes and apply statistical and machine learning algorithms to learn models and patterns from historical data. Furthermore, we apply these models and patterns to detect drifts in real time and infer possible root causes of reported errors based on a multidimensional correlation approach to improve the precision. Through experiments and case studies, we demonstrate the capability of CHASER.

Date

01 Jan 2017

Publication

IBM J. Res. Dev

Authors

Share