Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations
Abstract
The traditional low-power embedded processors such as Atom and ARM are entering the high-performance server market. At the same time, as the size of data grows, emerging Big Data applications require more and more server computational power that yields challenges to process data energy-efficiently using current high performance server architectures. Furthermore, physical design constraints, such as power and density have become the dominant limiting factor for scaling out servers. Numerous big data applications rely on using the Hadoop MapReduce framework to perform their analysis on large-scale datasets. Since Hadoop configuration parameters as well as architecture parameters directly affect the MapReduce job performance and energy-efficiency, system and architecture level parameters tuning is vital to maximize the energy efficiency. In this work, through methodical investigation of performance and power measurements, we demonstrate how the interplay among various Hadoop configurations and system and architecture level parameters affect the performance and energy-efficiency across various Hadoop applications.