Green MapReduce for heterogeneous data centers
Abstract
MapReduce has emerged as one of the key workloads in today's data centers, which constantly strive for an optimal tradeoff between energy consumption and performance. MapReduce alternates between computation and communication intensive phases with bursty workloads. The challenge to make execution of MapReduce green, lies in controlling server and network resources simultaneously. The related work offers various good solutions for homogenous systems, with the central theme of packing tasks into as small number of servers as possible and thus overlooking the possibility to 'sleep' servers and network components. This paper considers a very bursty MapReduce workload with distinct CPU, memory and network requirements executed on heterogenous data centers, where servers have various CPU/memory capacities and execute request in a process-sharing manner. To reduce energy consumption while maintaining a low task response time, we propose an online energy minimization path algorithm, termed GEMS, to schedule MapReduce tasks, in cooperation with sleeping policies on servers as well as the switches. Using Google MapReduce traces, our simulation experiments show that our proposed solution gains a significant energy saving of 35% and meanwhile improves task response times by 35% on heterogenous data centers, compared to policies which are network agnostic or adopt no sleeping schedule. Overall, we achieve greener and faster MapReduce with (surprisingly) only a slightly higher number of servers, by considering energy consumption rather than conventional approach of considering power values only.