Publication
ACM Transactions on Storage
Paper

Challenges and solutions for tracing storage systems: A case study with spectrum scale

View publication

Abstract

IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system. In this research article, we discuss the evolution of the mature and highly scalable GPFS tracing tool and present the exploratory study of GPFS’ new tracing interface, FlexTrace, which allows developers and users to accurately specify what to trace for the problem they are trying to solve. We evaluate our methodology and prototype, demonstrating that the proposed approach has negligible overhead, even under intensive I/O workloads and with low-latency storage devices.

Date

Publication

ACM Transactions on Storage

Authors

Share