Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading

Zhen Jia; Chao Xue; Guancheng Chen; Jianfeng Zhan; Lixin Zhang; Yonghua Lin; Peter Hofstee

doi:10.1145/2967938.2967957

PACT 2016

Conference paper

11 Sep 2016

Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading

View publication

Abstract

Much research work devotes to tuning big data analytics in modern data centers, since %the truth that even a small percentage of performance improvement immediately translates to huge cost savings because of the large scale. Simultaneous multithreading (SMT) receives great interest from data center communities, as it has the potential to boost performance of big data analytics by increasing the processor resources utilization. For example, the emerging processor architectures like POWER8 support up to 8-way multithreading. However, as different big data workloads have disparate architectural characteristics, how to identify the most efficient SMT configuration to achieve the best performance is challenging in terms of both complex application behaviors and processor architectures. In this paper, we specifically focus on auto-tuning SMT configuration for Spark-based big data workloads on POWE-R8. However, our methodology could be generalized and extended to other programming software stacks and other architectures. We propose a prediction-based dynamic SMT threading (PBDST) framework to adjust the thread count in SMT cores on POWER8 processors by using versatile machine learning algorithms.Its innovation lies in adopting online SMT configuration predictions derived from micro-architecture level profiling, to regulate the thread counts that could achieve nearly optimal performance. Moreover it is implemented at Spark software stack layer and transparent to user applications. After evaluating a large set of machine learning algorithms, we choose the most efficient ones to perform online predictions. The experimental results demonstrate that our approach can achieve up to 56.3% performance improvement and an average performance gain of 16.2% in comparison with the default configuration-the maximum SMT configuration-SMT8 on our system.

Conference paper