SPARKBENCH: A comprehensive benchmarking suite for in memory data analytic platform spark

Min Li; Jian Tan; Yandong Wang; Li Zhang; Valentina Salapura

doi:10.1145/2742854.2747283

CF 2015

Conference paper

06 May 2015

SPARKBENCH: A comprehensive benchmarking suite for in memory data analytic platform spark

View publication

Abstract

Spark has been increasingly adopted by industries in recent years for big data analysis by providing a fault tolerant, scalable and easyto-use in memory abstraction. Moreover, the community has been actively developing a rich ecosystem around Spark, making it even more attractive. However, there is not yet a Spark specify benchmark existing in the literature to guide the development and cluster deployment of Spark to better fit resource demands of user applications. In this paper, we present SPARKBENCH, a Spark specific benchmarking suite, which includes a comprehensive set of applications. SPARKBENCH covers four main categories of applications, including machine learning, graph computation, SQL query and streaming applications. We also characterize the resource consumption, data flow and timing information of each application and evaluate the performance impact of a key configuration parameter to guide the design and optimization of Spark data analytic platform.

Conference paper