Abstract
In a common form of a RAID 5 architecture, data is organized on a disk array consisting of N + 1 disks into stripes of N data blocks and one parity block (with parity block locations staggered so as to balance the number of parity blocks on each disk). This allows data to be recovered in the event of a single disk failure. Here we consider an extension to this architecture in which parity information applies to arbitrary subsets of the data blocks in each stripe. Using several simplifying assumptions, we present simulation and analytic results that provide estimates of the improvement using this approach, in terms of total I/O operations, as compared to 1) conventional RAID 5 under a random single-block write workload, and 2) the use of a log-structured file system in which data is written out in stripes. Results on the reduction of disk recovery costs are also presented. © 1997 IEEE.