Learning, indexing, and diagnosing network faults
Abstract
Modern communication networks generate massive volume of operational event data, e.g., alarm, alert, and metrics, which can be used by a network management system (NMS) to diagnose potential faults. In this work, we introduce a new class of indexable fault signatures that encode temporal evolution of events generated by a network fault as well as topological relationships among the nodes where these events occur. We present an efficient learning algorithm to extract such fault signatures from noisy historical event data, and with the help of novel space-time indexing structures, we show how to perform efficient, online signature matching. We provide results from extensive experimental studies to explore the efficacy of our approach and point out potential applications of such signatures for many different types of networks including social and information networks. Copyright 2009 ACM.