BigProvision: A provisioning framework for big data analytics
In the past few years, big data has attracted significant attention, and many analytics platforms, such as Hadoop, have been developed to enable the analysis of massive data. Nevertheless, it is still very challenging to provision, let alone optimize, a comprehensive system that includes various aspects, from the computing infrastructure to the analytics programs. To tackle this challenge, in this article, we propose a novel provisioning framework, BigProvision, to provision big data analytics systems. The main idea of the framework is to first evaluate and model the performance of different big data analytics approaches, given a set of sample data and various analytics requirements, such as the expected results, budget, response time, and so on. Based on the evaluation and modeling results, BigProvision can generate a provisioning configuration that can be used to configure the whole system for big data analytics. To evaluate the performance of the proposed framework, we develop an experimental prototype that supports three analytics platforms, Hadoop, Spark, and GraphLab. Our experiments show that for the classic PageRank analysis, both GraphLab and Spark can outperform Hadoop under different requirements. Moreover, by modeling the results, our prototype can determine the expected settings, such as the number of machines and network capacity, for the system that shall handle the complete data set. The prototype and experiments demonstrate that the proposed framework has great potential to facilitate the provision and optimization of future big data analytics systems.