Conference paper

Performance analysis of Coupling Scheduler for MapReduce/Hadoop

View publication


For MapReduce/Hadoop, map and reduce phases exhibit fundamentally distinguishing characteristics. Additionally, these two phases admit complicated and tight dependency on each other, causing the repeatedly observed starvation problem with the widely used Fair Scheduler. To mitigate this problem, we design Coupling Scheduler, which, among other new features, jointly schedules map and reduce tasks by coupling their progresses, different from existing ones that treat them separately. This design is based on the intuition that allocating excess resources to reduce tasks without balancing with the map task progress of the same job is likely to result in resource underutilization since a job is deemed done only when both phases complete. In order to analytically understand the performance of this design, we propose a model that captures the fundamental scheduling characteristics for MapReduce. Specifically, the map phase is modeled by a processor sharing queue, and the reduce phase by a "sticky processor sharing" queue. Along with the important dependence between these two types of tasks, we show that, for a class of jobs with regularly varying map service times, the job processing time distribution under Coupling Scheduler can be one order better than Fair Scheduler. These theoretical results are validated through simulations and the improved performance is further illustrated through real experiments on our testbed. © 2012 IEEE.


04 Jun 2012