Publication
SOLI 2014
Conference paper

Big Data architecture for IT incident management

View publication

Abstract

IT incident management aims to restore normal service quality and availability of IT systems from interruptions. IT incidents often have complicated causes aggregated from an IT environment composed of thousands of interdependent components. Incident diagnosis then requires collecting and analyzing a large scale of data regarding these components, often, in real time to find suspect causes. It is extremely difficult to fulfill this requirement using traditional techniques. In this paper, we propose a new analysis architecture using Big Data techniques. This architecture leverages stream computing and MapReduce techniques to analyze data from various data sources, uses NoSQL databases to store incident-related documents and their relationships, and further utilizes other analytical techniques to examine the documents for root causes and failure prediction. We demonstrate this approach using a real-world example and present evaluation results from a recent pilot study.

Date

Publication

SOLI 2014

Authors

Share