On the adoption and impact of predictive analytics for server incident reduction

Ioana Giurgiu; Dorothea Wiesmann; Jasmina Bogojeska; David Lanyi; George Stark; R.B. Wallace; M.M. Pereira; A.A. Hidalgo

doi:10.1147/JRD.2016.2631400

IBM J. Res. Dev

Paper

01 Jan 2017

On the adoption and impact of predictive analytics for server incident reduction

View publication

Abstract

The Predictive Analytics for Server Incident Reduction (PASIR) solution developed at IBM has been broadly deployed to 130 IT environments since the beginning of 2014. The infrastructures of these IT environments, pertaining to various industries around the world, are serviced by IBM support groups. More specifically, incidents occurring on servers, including the descriptions of the problems, are reported into a ticket management system. These tickets are then resolved by the assigned support teams, which record in the system the resolution steps taken. PASIR, first classifies the incident tickets of an IT environment to identify high-impact incidents describing server unavailability and performance degradation issues by using ticket descriptions and resolutions. Second, the occurrence of these high-impact tickets is correlated with server properties and utilization measures to identify troubled server configurations and prescribe improvement actions through multivariate analysis. In this paper, we present the findings from deploying our two-step machine learning model in the field. In particular, we describe the PASIR methodology, from ticket classification to the recommendation of modernization actions. We also assess the process of manual ticket labeling and the impact of noisy input data on our automatic classifier, and we demonstrate the model effectiveness by comparing predictions on the impact of prescriptive actions with actual system improvements.

Paper