Performance of Graph Analytics Applications on Many-Core Processors
Attaining good performance on graph analytics applications on modern day many-core processors is challenging, because these processors have complex pipelines to manage out of order execution of hundreds of instructions in flight. These pipelines have been optimized for high performance computing (HPC) applications, not for graph analytics. It is preferable to leave the task of attaining good performance to the system developers, and to separate the performance concern from the application programmer's concerns. In this paper, we show that the linear algebra formulation of graph-analytics effectively handles the aforementioned separation of concerns. This formulation is a better fit for many-core processors as the many-core processors are optimized for HPC applications which have a substantial linear algebra component. We show that on POWER8, a many-core processor, an eightfold performance advantage can be attained on the Graph500 benchmark by adopting the linear algebra formulation. We also present the CPI stack analysis of three graph analytics kernels, and show that the linear algebra implementations of these kernels make efficient use of the POWER8 core. Inhibitors to still better performance are discussed.