Detecting Causal Structure on Cloud Application Microservices Using Granger Causality Models
The loosely-coupled microservices architecture has become increasingly popular due to the advantage of its modularity and elasticity in cloud applications. However, it also seriously complicates cloud management and degrades the performance of IT operations. Today, AI has been the locus of commerce and transactions, and transforming traditional IT operations for speed and growth. Inferring the dependencies among an application’s microservices can greatly help SREs diagnose possible root causes of performance issues, which is a hard task due to the complex topology of microservices is often unknown in practice. Prior literature on detecting causal structure for cloud services requires significant application instrumentation, which rarely holds in reality. In this work, we leverage Granger causality models on just monitored log data of a microservice-based application to infer the impact of dependencies between microservices. We first describe the approach of modeling discrete log data as time series, and then formally define the Granger causality problem using both linear and nonlinear autoregressive models. Finally, we conduct an extensive comparative study to show the performance of the state-of-the-art linear and nonlinear (i.e., neural) Granger causality methods on both synthetic data and real-world log data from a publicly available benchmark microservice system. Our preliminary results indicate that neural Granger causality models outperform traditional Granger causality methods on both linear and nonlinear time series data, while for large linear time series, linear Granger causal models are more efficient with high accuracy. Using the real-world log data, we also demonstrate our interesting findings on inferred dependency graph of microservices by linear and neural Granger causality models.