Using active disks for failure detection: Two phase commit without blocking

Elizabeth Borowsky; Richard Golding

PDCS 2004

Conference paper

01 Dec 2004

Using active disks for failure detection: Two phase commit without blocking

Abstract

Recent advances in network attached disk technology have inspired a host of research on distributed storage systems [1, 2, 3, 4]. Naturally, part of the appeal of such systems is the opportunity they afford for widely replicated data; however, with wide data redundancy comes a host of consistency issues. This paper address the problem of writing concurrently to multiple network attached devices with a two phase commit write protocol. Most work in this area proposes using three-phase commit protocols to avoid blocking [5, 6, 2]. We introduce a novel reconciliation protocol managed by the storage devices themselves to alleviate a blocked transaction should one occur. In our system the set of shared disks implementing a replicated object maintains coordination to the object. This approach allows shorter access times in the common case where clients and storage devices do not fail, reverting to a separate procedure to resolve blocking and maintain data consistency only when failures occur.

Conference paper