Publication
ICPP 2017
Conference paper

High-Performance Recommender System Training Using Co-Clustering on CPU/GPU Clusters

View publication

Abstract

Recommender systems are becoming the crystal ball of the Internet because they can anticipate what the users may want, even before the users know they want it. However, the machine-learning algorithms typically involved in the training of such systems can be computationally expensive, and often may require several days for retraining. Here, we present a distributed approach for load-balancing the training of a recommender system based on state-of-art non-negative matrix factorization principles. The approach can exploit the presence of a cluster of mixed CPUs and GPUs, and results in a 466-fold performance improvement compared with the serial CPU implementation, and a 15-fold performance improvement compared with the best previously reported results for the popular Netflix data set.

Date

Publication

ICPP 2017