About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
HPDC 2017
Conference paper
CuMF-SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs
Abstract
Stochastic gradient descent (SGD) is widely used by many machine learning algorithms. It is efficient for big data applications due to its low algorithmic complexity. SGD is inherently serial and its parallelization is not trivial. How to parallelize SGD on many-core architectures (e.g. GPUs) for high efficiency is a big challenge. In this paper, we present cuMFSGD, a parallelized SGD solution for matrix factorization on GPUs. We first design high-performance GPU computation kernels that accelerate individual SGD updates by exploiting model parallelism. We then design efficient schemes that parallelize SGD updates by exploiting data parallelism. Finally, we scale cuMFSGD to large data sets that cannot fit into one GPU's memory. Evaluations on three public data sets show that cuMFSGD outperforms existing solutions, including a 64-node CPU system, by a large margin using only one GPU card.