Locating faults in MANET-Hosted software systems
We present a method to locate faults in service-based software systems hosted on mobile ad hoc networks (MANETs). In such systems, computations are structured as interdependent services distributed across the network, collaborating to satisfy client requests. Faults, which may occur at either or both the service and network layers, propagate by cascading through some subset of the services, from their root causes back to the clients that initiate requests. Fault localization in this environment is especially challenging because the systems are typically subject to a wider variety and higher incidence of faults than those deployed in fixed networks, the resources available to collect and store analysis data are severely limited, and many of the sources of faults are by their nature transient. Our method makes use of service-dependence and fault data that are harvested in the network through decentralized, run-time observations of service interactions and fault symptoms. We have designed timing- A nd Bayesian-based reasoning techniques to analyze the data in the context of a specific fault propagation model. The analysis provides a ranked list of candidate fault locations. Through extensive simulations, we evaluate the performance of our method in terms of its accuracy in correctly ranking root causes under a wide range of operational conditions.