Using residual resource consumption to resample top-k monitoring reports

Thomas Gschwind; Metin Feridun

doi:10.23919/INM.2017.7987290

IM 2017

Conference paper

20 Jul 2017

Using residual resource consumption to resample top-k monitoring reports

View publication

Abstract

Top-k reports are compound metrics that provide useful information when diagnosing problems in a system, e.g., to identify persistent CPU usage by a process. In large systems, these reports are collected at regular intervals and need to be resampled to a coarser granularity to answer user queries for different sampling periods, or to save space and make it possible to keep historical data for long term performance analysis. However, resampling top-k reports, i.e., aggregating several reports collected for small time intervals into a single top-k report can introduce inaccuracies. For example, a process that consistently uses CPU over the aggregation interval but did not make it to the short term top-k reports will be missing from the aggregated report. In this paper, we present an algorithm that collects top-k reports at regular intervals and can aggregate them with little or no error. This is done by including residual resource consumption of unreported, but potentially significant entities in the top-k reports, and using these residual values during aggregation. We show different approaches to including residual resource consumption in individual top-k reports, analyze the error introduced, and demonstrate the effectiveness of the algorithm in real-world scenarios.

Conference paper