Publication
Big Data 2020
Conference paper

An End-to-End Context Aware Anomaly Detection System

View publication

Abstract

Anomaly detection (AD) is very important across several real-world problems in the heavy industries and Internet-of-Things (IoT) domains. Traditional methods so far have categorized anomaly detection into (a) unsupervised, (b) semi-supervised and (c) supervised techniques. A relatively unexplored direction is the development of context aware anomaly detection systems which can build on top of any of these three techniques by using side information. Context can be captured from a different modality such as semantic graphs encoding grouping of sensors governed by the physics of the asset. Process flow diagrams of an operational plant depicting causal relationships between sensors can also provide useful context for ML algorithms. Capturing such semantics by itself can be pretty challenging, however, our paper mainly focuses on, (a) designing and implementing effective anomaly detection pipelines using sparse Gaussian Graphical Models with various statistical distance metrics, and (b) differentiating these pipelines by embedding contextual semantics inferred from graphs so as to obtain better KPIs in practice. The motivation for the latter of these two has been explained above, and the former in particular is well motivated by the relatively mediocre performance of highly parametric deep learning methods for small tabular datasets (compared to images) such as IoT sensor data. In contrast to such traditional automated deep learning (AutoAI) techniques, our anomaly detection system is based on developing semantics-driven industry specific ML pipelines which perform scalable computation evaluating several models to identify the best model. We benchmark our AD method against state-of-the-art AD techniques on publicly available UCI datasets. We also conduct a case study on IoT sensor and semantic data procured from a large thermal energy asset to evaluate the importance of semantics in enhancing our pipelines. In addition, we also provide explainable insights for our model which provide a complete perspective to a reliability engineer.

Date

10 Dec 2020

Publication

Big Data 2020