Dependency analysis of cloud applications for performance monitoring using recurrent neural networks

Syed Yousaf Shah; Zengwen Yuan; Songwu Lu; Petros Zerfos

doi:10.1109/BigData.2017.8258087

Big Data 2017

Conference paper

01 Jul 2017

Dependency analysis of cloud applications for performance monitoring using recurrent neural networks

View publication

Abstract

Performance monitoring of cloud-native applications that consist of several micro-services involves the analysis of time series data collected from the infrastructure, platform, and application layers of the cloud software stack. The analysis of the runtime dependencies amongst the component microservices is an essential step towards performing cloud resource management, detecting anomalous behavior of cloud applications, and meeting customer Service Level Agreements (SLAs). Finding such dependencies is challenging due to the non-linear nature of interactions, aberrant data measurements and lack of domain knowledge. In this paper, we propose a novel use of the modeling capability of Long-Short Term Memory (LSTM) recurrent neural networks, which excel in capturing temporal relationships in multi-variate time series data and being resilient to noisy pattern representations. Our proposed technique looks into the LSTM model structure, to uncover dependencies amongst performance metrics, which were learned during training. We further apply this technique in three monitoring use cases, namely finding the strongest performance predictors, discovering lagged/temporal dependencies, and improving the accuracy of forecasting for a given metric. We demonstrate the viability of our approach, by comparing the results of our proposed method in the three use cases with those obtained from previously proposed methods, such as Granger causality and the classical statistical time series analysis models, such as ARIMA and Holt-Winters. For our experiments and analysis, we use performance monitoring data collected from two sources: a controlled experiment involving a sample cloud application that we deployed in a public cloud infrastructure and cloud monitoring data collected from the monitoring service of an operational, public cloud service provider.

Conference paper