Matrix methods for lost data reconstruction in erasure codes
Abstract
Erasures codes, particularly those protecting against multiple failures in RAID disk arrays, provide a code-specific means for reconstruction of lost (erased) data. In the RAID application this is modeled as loss of strips so that reconstruction algorithms are usually optimized to reconstruct entire strips; that is, they apply only to highly correlated sector failures, i.e., sequential sectors on a lost disk. In this paper we address two more general problems: (1) recovery of lost data due to scattered or uncorrelated erasures and (2) recovery of partial (but sequential) data from a single lost disk (in the presence of any number of failures). The latter case may arise in the context of host IO to a partial strip on a lost disk. The methodology we propose for both problems is completely general and can be applied to any erasure code, but is most suitable for XOR-based codes. For the scattered erasures, typically due to hard errors on the disk (or combinations of hard errors and disk loss), our methodology provides for one of two outcomes for the data on each lost sector. Either the lost data is declared unrecoverable (in the information-theoretic sense) or it is declared recoverable and a formula is provided for the reconstruction that depends only on readable sectors. In short, the methodology is both complete and constructive.