Reliability of geo-replicated cloud storage systems

Ilias Iliadis; Dmitry Sotnikov; Paula Ta-Shma; Vinodh Venkatesan

doi:10.1109/PRDC.2014.30

PRDC 2014

Conference paper

03 Dec 2014

Reliability of geo-replicated cloud storage systems

View publication

Abstract

Network bandwidth between sites is typically more scarce than bandwidth within a site in geo-replicated cloud storage systems, and can potentially be a bottleneck for recovery operations. We study the reliability of geo-replicated cloud storage systems taking into account different bandwidths within a site and between sites. We consider a new recovery scheme called staged rebuild and compare it with both a direct scheme and a scheme known as intelligent rebuild. To assess the reliability gains achieved by these schemes, we develop an analytical model that incorporates various relevant aspects of storage systems, such as bandwidths, latent sector errors, and failure distributions. The model applies in the context of Open Stack Swift, a widely deployed cloud storage system. Under certain practical system configurations, we establish that order of magnitude improvements in mean time to data loss (MTTDL) can be achieved using these schemes.

Conference paper