Network Capability in Localizing Node Failures via End-to-End Path Measurements
We investigate the capability of localizing node failures in communication networks from binary states (normal/failed) of end-to-end paths. Given a set of nodes of interest, uniquely localizing failures within this set requires that different observable path states associate with different node failure events. However, this condition is difficult to test on large networks due to the need to enumerate all possible node failures. Our first contribution is a set of sufficient/necessary conditions for identifying a bounded number of failures within an arbitrary node set that can be tested in polynomial time. In addition to network topology and locations of monitors, our conditions also incorporate constraints imposed by the probing mechanism used. We consider three probing mechanisms that differ according to whether measurement paths are: (i) arbitrarily controllable; (ii) controllable but cycle-free; or (iii) uncontrollable (determined by the default routing protocol). Our second contribution is to quantify the capability of failure localization through: 1) the maximum number of failures (anywhere in the network) such that failures within a given node set can be uniquely localized and 2) the largest node set within which failures can be uniquely localized under a given bound on the total number of failures. Both measures in 1) and 2) can be converted into the functions of a per-node property, which can be computed efficiently based on the above sufficient/necessary conditions. We demonstrate how measures 1) and 2) proposed for quantifying failure localization capability can be used to evaluate the impact of various parameters, including topology, number of monitors, and probing mechanisms.