CuMF-SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs

Xiaolong Xie; Wei Tan; Liana Fong; Yun Liang

doi:10.1145/3078597.3078602

HPDC 2017

Conference paper

26 Jun 2017

CuMF-SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs

View publication

Abstract

Stochastic gradient descent (SGD) is widely used by many machine learning algorithms. It is efficient for big data applications due to its low algorithmic complexity. SGD is inherently serial and its parallelization is not trivial. How to parallelize SGD on many-core architectures (e.g. GPUs) for high efficiency is a big challenge. In this paper, we present cuMFSGD, a parallelized SGD solution for matrix factorization on GPUs. We first design high-performance GPU computation kernels that accelerate individual SGD updates by exploiting model parallelism. We then design efficient schemes that parallelize SGD updates by exploiting data parallelism. Finally, we scale cuMFSGD to large data sets that cannot fit into one GPU's memory. Evaluations on three public data sets show that cuMFSGD outperforms existing solutions, including a 64-node CPU system, by a large margin using only one GPU card.

Conference paper