Evaluating the POWER8 architecture through optimizing stencil-based algorithms
Abstract
With the innovation of IBM POWER processors as well as the rapid development of OpenPOWER league, a part of future supercomputers in 100P level would adopt POWER as CPU processors. As hardware features of POWER8 have been studied in these years, the lack of tuning guidelines indicates a gap between software performance and hardware capabilities. To fully evaluate the POWER8 processor and provide tuning guidelines of modern scientific applications based on POWER8, in this paper we employ 4 widely used stencil algorithms as our target program to evaluate the effectiveness of tuning techniques. By adopting a set of optimization methods, besides the significant speedup of 4 kinds of stencil based algorithms (5.95x, 7.62x, 4.34x, 5.36x speedup respectively), an evaluation of the impact of various tuning techniques are provided in detail, which would definitely benefit the performance tuning approach of similar scientific applications based on IBM POWER processors. From above optimizations and customized analysis based on POWER8, a full scale feature overview as well as the performance tuning guidance are able to be seen in detail.