Publication
Big Data 2013
Conference paper

Correlation-based performance analysis for full-system MapReduce optimization

View publication

Abstract

Big Data is changing this world at a surprising speed, and MapReduce plays a critical role in finding insights in Big Data. However, to efficiently extract insights from Big Data, performance optimization of MapReduce applications is a challenging task. To facilitate the full-system optimization of MapReduce applications, we propose a correlation-based performance analysis approach to efficiently identify critical outliers. The basic intuition is that critical outliers are key to the overall performance and they can only be accurately identified by correlating different phases, tasks and resources. Based on the proposed approach, we further implement a correlation-based performance analysis tool, called Sonata. It can efficiently identify critical outliers, and then, recommend optimization suggestions for practitioners based on embedded rules. Since the performance overhead is key to the applicability of a performance tool, we conduct experiments to demonstrate that Sonata is a practical tool with less than 5% overhead and good scalability. To demonstrate the effectiveness of Sonata, we share several cases during the performance tuning of IBM Platform SymphonyTM with the help of Sonata. © 2013 IEEE.

Date

Publication

Big Data 2013

Authors

Share