Highly scalable parallel collaborative filtering algorithm
Abstract
Collaborative filtering (CF) based recommender systems have gained wide popularity in Internet companies like Amazon, Netflix, Google News, and others. These systems make automatic predictions about the interests of a user by inferring from information about like-minded users. Real-time CF on highly sparse massive datasets, while achieving a high prediction accuracy, is a computationally challenging problem. In this paper, we present the design of a soft real-time (around 1 min.) parallel CF algorithm based on the Concept Decomposition technique [1]. Our parallel algorithm has been optimized for multicore/many-core architectures while maintaining the prediction accuracy of 0.84 RMSE. Using the Netflix dataset, we demonstrate the performance and scalability of our algorithm (in both batch mode and online mode) on a 32-core Power6 based SMP system. Our parallel algorithm delivered training time of 64s on the full Netflix dataset and prediction time of 4.5s on 1.4M ratings (3.2μ s per rating prediction). This is 12.6x better than the best known sequential training time and around 33x better than the best known sequential prediction time [2], along with high accuracy (0.84 RMSE). To the best of our knowledge, this is also the best known parallel performance at such high accuracy. ©2010 IEEE.