Selectively retrofitting monitoring in distributed systems
Abstract
Current distributed systems carry legacy subsystems lacking sufficient instrumentation for monitoring the end-to-end business transactions supported by these systems. In the absence of instrumentation, only probabilistic monitoring is possible by using time-stamped log-records. Retrofitting these systems with expensive monitoring instrumentation provides high-granularity, precise tracking of transactions. Given a limited budget, local instrumentation strategies which maximize the effectiveness of monitoring transactions throughout the system are proposed. The operation of the end-to-end system is modeled by a queuing network; each queue represents a subsystem which produces time-stamped log-records as transactions pass through it. Two simple heuristics for instrumentation are proposed which become optimal under certain conditions. One heuristic selects states in the transition diagram for local instrumentation in the decreasing order of the load factors of their queues. Sufficient conditions for this load-factor heuristic to be optimal are proven using the notion of stochastic order. The other heuristic selects states in the transition diagram based on the approximated tracking accuracy of probabilistic monitoring at each state, which is shown to be tight at low arrival rates.