Decentralized fault-tolerant event correlation

Gregory Aaron Wilkin; Patrick Eugster; Jayaram Kr Kallapalayam Radhakrishnan

doi:10.1145/2633687

ACM TOIT

Paper

07 Aug 2014

Decentralized fault-tolerant event correlation

View publication

Abstract

Despite the prognosed use of event correlation techniques for monitoring critical complex infrastructures or dealing with disasters in the physical world, little work exists on making event correlation systems themselves tolerant to failure. Existing systems either provide no guarantees on event deliveries, do not support multicast and thus provide no guarantees across individual processes, or then rely on centralized components or strong assumptions on the infrastructure. The FAIDECS system attempts to reconcile strong guarantees with practical performance in the presence of process crash failures. To that end, the FAIDECS system uses an overlay network with specific guarantees aligned with its proposed correlation language and guarantees. However, the language proposed lacks expressivity, and the system itself supports only very specific rigid semantics, incapable of supporting even fundamental features like sliding windows. After providing a comprehensive overview of the FAIDECS model and system, this article bridges the gap between strong guarantees and more established correlation languages and systems in several steps. First, we propose alternative semantics for several modules of the FAIDECS matching engine and revisit guarantees. Second, we pinpoint which guarantees are contradicted by which combinations of semantic options. Third, we investigate four correlation languages-StreamSQL, EQL, CEL, and TESLA-showing which semantic options their respective features correspond to in our model, and thus, ultimately, which guarantees of FAIDECS are maintained by which language features. © 2014 ACM.

Paper