About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
MSST 2016
Conference paper
A long-term user-centric analysis of deduplication patterns
Abstract
Deduplication has become essential in disk-based backup systems, but there have been few long-term studies of backup workloads. Most past studies either were of a small static snapshot or covered only a short period that was not representative of how a backup system evolves over time. For this paper, we collected 21 months of data from a shared user file system; 33 users and over 4,000 snapshots are covered. We analyzed the data set for a variety of essential characteristics. However, our primary focus was individual user data. Despite apparently similar roles and behavior in all of our users, we found significant differences in their deduplication ratios. Moreover, the data that some users share with others had a much higher deduplication ratio than average. We analyze this behavior and make recommendations for future deduplication systems design.