Yemanja-a layered event correlation engine for multi-domain server farms
Abstract
Yemanja is a model-based event correlation engine for multi-layer fault diagnosis. It targets complex propagating fault scenarios, and can smoothly correlate low-level network events with high-level application performance alerts related to quality of service violations. Entity-models that represent devices or abstract components encapsulate entity behavior. Distantly associated entities are not explicitly aware of each other, and communicate through event propagation chains. Yemanja's state-based engine supports generic scenario definitions, prioritization of alternate solutions, integrated problem-state and device testing, and simultaneous analysis of overlapping problems. The system of correlation rules was developed based on device, layer, and dependency analysis, and reveals the layered structure of computer networks. The primary objectives of this research include the development of reusable, configuration independent, correlation scenarios; adaptability and the extensibility of the engine to match the constantly changing topology of a multi-domain server farm; and the development of a concise specification language that is relatively simple yet powerful.