An approach to on-line predictive detection
Abstract
Predicting network performance problems enables network operators to take corrective actions in advance of service disruptions. Typically, service problems are detected by tests that compare a metric (e.g., response time) to a threshold. Herein, we present an on-line algorithm for predicting the probability of threshold violations over a time horizon. Our algorithm, uses two cascaded submodels. The first removes non-stationarities by employing a discrete time Kalman Filter in combination with analysis of variance. We derive parameters of the Kalman Filter from differential equations that de-scribe characteristics of the data. The second submodel estimates the probability of threshold violations by using a second order autoregressive model in combination with change-point detection. Using data from a production web server, we evaluate our approach and show that it produces average accuracies that are comparable to those of an off-line algorithm. However, our on-line al-gorithm produces predictions with considerably smaller variances. Further advantages of our approach are: (a) requiring much less data than the off-line technique-one day versus multiple months; and (b) adapting to changes in the system and workloads since parameters are estimated on-line. s.