Publication
ICDCS 2006
Conference paper

GENESIS: A scalable self-evolving performance management framework for storage systems

View publication

Abstract

In order to manage IO-related performance problems in a data-center environment, the administrator needs to understand the root-cause of the issue. The growing trend of system visualization, combined with the need to support end-to-end performance goals for enterprise applications, have made root-cause analysis a nontrivial problem - administrators are required to manually parse all hardware events, configuration modifications, and changes in access characteristics, across all tiers of the IO path from application servers to the disks. We propose a framework that assists storage administrators with root-cause analysis in distributed systems. GENESIS consists of three key modules: Abnormality Detection, Snapshot Generation, and Diagnosis. The Abnormality Detection module uses clustering algorithms to create and constantly evolve the normality models of measurable parameters in components. The Snapshot Generator is triggered by a combination of abnormality detection and policies to take compact snapshots of the system state for analysis whenever a significant change occurs. The Diagnosis module parses the snapshots and shortlists the root-cause for the administrator using knowledge about the impact of the run-time changes on IO performance. We have implemented an initial proof-of-concept of GENESIS in GPFS (a high performance distributed file-system) and validated its operation for several interesting real-world scenarios. Encouraged by the results, we are currently deploying our prototype in an existing data center environment. © 2006 IEEE.

Date

Publication

ICDCS 2006

Authors

Share