Evaluation of Causal Inference Techniques for AIOps
Inferring causality of events from log data is critical to IT operations teams who continuously strive to identify probable root causes of events in order to quickly resolve incident tickets so that downtimes and service interruptions are kept to a minimum. Although prior work has applied some specific causal inference techniques on proprietary log data, they fail to benchmark the performance of different techniques on a common system or dataset. In this work, we evaluate the performance of multiple state-of-the-art causal inference techniques using log data obtained from a publicly available benchmark microservice system. We model log data both as a timeseries of error counts and as a temporal event sequence and evaluate 3 families of Granger causal techniques: regression based, independence testing based, and event models. Our preliminary results indicate that event models yield causal graphs that have high precision and recall in comparison to regression and independence testing based Granger methods.