Publication
SYSTOR 2012
Conference paper

Insights for data reduction in primary storage: A practical analysis

View publication

Abstract

There has been increasing interest in deploying data reduction techniques in primary storage systems. This paper analyzes large datasets in four typical enterprise data environments to find patterns that can suggest good design choices for such systems. The overall data reduction opportunity is evaluated for deduplication and compression, separately and combined, then in-depth analysis is presented focusing on frequency, clustering and other patterns in the collected data. The results suggest ways to enhance performance and reduce resource requirements and system cost while maintaining data reduction effectiveness. These techniques include deciding which files to compress based on file type and size, using duplication affinity to guide deployment decisions, and optimizing the detection and mapping of duplicate content adaptively when large segments account for most of the opportunity. © 2012 ACM.

Date

Publication

SYSTOR 2012