Maximizing data locality in distributed systems

Fan Chung; Ronald Graham; Ranjita Bhagwan; Stefan Savage; Geoffrey M. Voelker

doi:10.1016/j.jcss.2006.07.001

Journal of Computer and System Sciences

Paper

01 Jan 2006

Maximizing data locality in distributed systems

View publication

Abstract

The effectiveness of a distributed system hinges on the manner in which tasks and data are assigned to the underlying system resources. Moreover, today's large-scale distributed systems must accommodate heterogeneity in both the offered load and in the makeup of the available storage and compute capacity. The ideal resource assignment must balance the utilization of the underlying system against the loss of locality incurred when individual tasks or data objects are fragmented among several servers. In this paper we describe this locality-maximizing placement problem and show that an optimal solution is NP-hard. We then describe a polynomial-time algorithm that generates a placement within an additive constant of two from optimal. © 2006 Elsevier Inc. All rights reserved.

Conference paper