High-Performance Recommender System Training Using Co-Clustering on CPU/GPU Clusters
Abstract
Recommender systems are becoming the crystal ball of the Internet because they can anticipate what the users may want, even before the users know they want it. However, the machine-learning algorithms typically involved in the training of such systems can be computationally expensive, and often may require several days for retraining. Here, we present a distributed approach for load-balancing the training of a recommender system based on state-of-art non-negative matrix factorization principles. The approach can exploit the presence of a cluster of mixed CPUs and GPUs, and results in a 466-fold performance improvement compared with the serial CPU implementation, and a 15-fold performance improvement compared with the best previously reported results for the popular Netflix data set.