Blocking optimization techniques for sparse tensor computation

Jee Choi; Xing Liu; Shaden Smith; Tyler Simon

doi:10.1109/IPDPS.2018.00066

IPDPS 2018

Conference paper

03 Aug 2018

Blocking optimization techniques for sparse tensor computation

View publication

Abstract

We present a detailed analysis of the sparse matricized tensor times Khatri-Rao product (MTTKRP) kernel that is the key bottleneck in various sparse tensor computations. By using the well-known roofline model and carefully instrumenting the state-of-The-Art MTTKRP code with the pressure point analysis technique, we show that the performance of MTTKRP is highly sensitive to data traffic like other sparse computations. We also identify key performance bottlenecks that diverge from our prior knowledge of sparse MTTKRP on modern processors. We propose to use blocking optimization techniques to address the bottlenecks identified within the MTTKRP computation. By using a combination of two blocking techniques, we achieve more than 3.5x and 2.0x speedup over the stateof-The-Art MTTKRP implementation in the SPLATT library on four real-world data sets and two synthetically generated data sets, respectively. We also employ a new data partitioning technique for the distributed MTTKRP implementation, which provides an additional dimension of scalability. As a result, our implementation exhibits good strong scaling and a 1.6x speedup on 64 nodes for two real-world data sets, when compared to the SPLATT library.

Conference paper