Optimizing Sparse Matrix-Vector multiplication for large-scale data analytics

Daniele Buono; Fabrizio Petrini; Fabio Checconi; Xing Liu; Xinyu Que; Chris Long; Tai-Ching Tuan

doi:10.1145/2925426.2926278

ICS 2016

Conference paper

01 Jun 2016

Optimizing Sparse Matrix-Vector multiplication for large-scale data analytics

View publication

Abstract

Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel, used by a large class of numerical algorithms. Emerging big-data and machine learning applications are propelling a renewed interest in SpMV algorithms that can tackle massive amount of unstructured data - rapidly approaching the TeraByte range - with predictable, high performance. In this paper we describe a new methodology to design SpMV algorithms for shared memory multiprocessors (SMPs) that organizes the original SpMV algorithm into two distinct phases. In the first phase we build a scaled matrix, that is reduced in the second phase, providing numerous opportunities to exploit memory locality. Using this methodology, we have designed two algorithms. Our experiments on irregular big-data matrices (an order of magnitude larger than the current state of the art) show a quasi-optimal scaling on a large-scale POWER8 SMP system, with an average performance speedup of 3.8×, when compared to an equally optimized version of the CSR algorithm. In terms of absolute performance, with our implementation, the POWER8 SMP system is comparable to a 256-node cluster. In terms of size, it can process matrices with up to 68 billion edges, an order of magnitude larger than state-of-the-art clusters.

Paper