Efficient implementation of sparse matrix-sparse vector multiplication for large scale graph analytics
We developed a parallel algorithm to improve the cache behavior and overall performance for multiplication of sparse matrices with sparse vectors (SpMSpV), an operation used increasingly in large graph analytics, particularly dynamic graphs in social networks and homeland security applications. The proposed algorithm builds upon the two-phase approach of partitioning the multiplication into a scaling phase and an aggregation phase, to achieve more cache-friendly access patterns individually in each phase , . However, to handle dynamic graphs and achieve better load balancing for parallel implementation, we use a combination of private and shared bins, with synchronized access to shared bins to exchange the product terms between the two phases. The new algorithm accumulates product terms in private bins for each thread. The algorithm then performs a bulk transfer between a private bin and a shared bin, when the private bin becomes full. Then results are aggregated from the shared bins. In addition, we employ heuristics to decide the best algorithm for SpMSpV based on the number of nonzeros involved in the operation. When the number of nonzeros is large, it may be better to perform the operation as SpMV (sparse matrix times dense vector) despite the added conversion cost. Also, if the number of nonzeros is low it is advantageous to use a simplified algorithm. We compared our algorithm with existing algorithms for SpMSpV, and our evaluation shows that execution time is reduced by several times when large graphs are considered.