About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SOLI 2014
Conference paper
Big Data architecture for IT incident management
Abstract
IT incident management aims to restore normal service quality and availability of IT systems from interruptions. IT incidents often have complicated causes aggregated from an IT environment composed of thousands of interdependent components. Incident diagnosis then requires collecting and analyzing a large scale of data regarding these components, often, in real time to find suspect causes. It is extremely difficult to fulfill this requirement using traditional techniques. In this paper, we propose a new analysis architecture using Big Data techniques. This architecture leverages stream computing and MapReduce techniques to analyze data from various data sources, uses NoSQL databases to store incident-related documents and their relationships, and further utilizes other analytical techniques to examine the documents for root causes and failure prediction. We demonstrate this approach using a real-world example and present evaluation results from a recent pilot study.