Matrix factorization on GPUs with memory optimization and approximate computing

Wei Tan; Cheng Li; Shiyu Chang; Zijun Wang; Liana Fong; Liangliang Cao

doi:10.1145/3225058.3225096

ICPP 2018

Conference paper

13 Aug 2018

Matrix factorization on GPUs with memory optimization and approximate computing

View publication

Abstract

Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the ields of collaborative iltering, data compression, feature extraction, word embedding, etc. While many problem-speciic optimization techniques have been proposed, alternating least square (ALS) remains popular due to its general applicability (e.g. easy to handle positive-unlabeled inputs), fast convergence and parallelization capability. Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insuicent. This is because a single machine provides limited compute power for large-scale data while multiple machines sufer from the network communication bottleneck. To address the aforementioned challenge, accelerating ALS on garphics processing units (GPUs) is a promising direction. We propose the novel approach in enhancing the MF eiciency via both memory optimization and approximate computing. The former exploits GPU memory hierarchy to increase data reuse, while the later reduces unneccessary computing without hurting the convergence of learning algorithms. Extensive experiments on large-scale datasets show that our solution not only outperforms the competing CPU solutions by a large margin but also has a 2x-4x performance gain compared to the state-of-the-art GPU solutions. Our implementations are open-sourced and publicly available.

Conference paper