About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Big Data 2013
Conference paper
Correlation-based performance analysis for full-system MapReduce optimization
Abstract
Big Data is changing this world at a surprising speed, and MapReduce plays a critical role in finding insights in Big Data. However, to efficiently extract insights from Big Data, performance optimization of MapReduce applications is a challenging task. To facilitate the full-system optimization of MapReduce applications, we propose a correlation-based performance analysis approach to efficiently identify critical outliers. The basic intuition is that critical outliers are key to the overall performance and they can only be accurately identified by correlating different phases, tasks and resources. Based on the proposed approach, we further implement a correlation-based performance analysis tool, called Sonata. It can efficiently identify critical outliers, and then, recommend optimization suggestions for practitioners based on embedded rules. Since the performance overhead is key to the applicability of a performance tool, we conduct experiments to demonstrate that Sonata is a practical tool with less than 5% overhead and good scalability. To demonstrate the effectiveness of Sonata, we share several cases during the performance tuning of IBM Platform SymphonyTM with the help of Sonata. © 2013 IEEE.