About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
MASCOTS 2009
Conference paper
Reliability modeling of RAID storage systems with latent errors
Abstract
The reliability of disk storage systems is adversely affected by the presence of latent sector errors. Disk scrubbing and intradisk redundancy are two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID storage systems. Two recent studies have investigated the effectiveness of these schemes, but they have reached opposing conclusions. These studies were conducted using two different modeling approaches. We present a detailed investigation which reveals that this discrepancy originates from the difference in the approach adopted, and the level of detail incorporated by the two models. We show that, as a consequence, these models provide reliability results which may differ by orders of magnitude therefore leading to contradicting conclusions. We develop a common analytical framework within which we investigate the details, merits, weaknesses, and applicability of each model. We resolve this discrepancy by deriving enhanced models that incorporate inherent characteristics of the latent-error process and provide realistic reliability results that are in good agreement. We subsequently reassess the reliability results and conclusions presented in previous studies regarding the disk scrubbing and the intradisk redundancy scheme.