Scalable nonparametric multiway data analysis
Abstract
Multiway data analysis deals with multiway ar-rays, i.e., tensors, and the goal is twofold: pre-dicting missing entries by modeling the inter-actions between array elements and discovering hidden patterns, such as clusters or communities in each mode. Despite the success of existing tensor factorization approaches, they are either unable to capture nonlinear interactions, or com-putationally expensive to handle massive data. In addition, most of the existing methods lack a principled way to discover latent clusters, which is important for better understanding of the data. To address these issues, we propose a scalable nonparametric tensor decomposition model. It employs Dirichlet process mixture (DPM) prior to model the latent clusters; it uses local Gaussian processes (GPS) to capture nonlinear relation-ships and to improve scalability. An efficient on-line variational Bayes Expectation-Maximization algorithm is proposed to learn the model. Ex-periments on both synthetic and real-world data show that the proposed model is able to discover latent clusters with higher prediction accuracy than competitive methods. Furthermore, the pro-posed model obtains significantly better predic-tive performance than the state-of-the-art large scale tensor decomposition algorithm, GigaTen-sor, on two large datasets with billions of entries.