About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
DSN 2009
Conference paper
Evaluating the impact of undetected disk errors in RAID systems
Abstract
Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption. ©2009 IEEE.