DinTucker: Scaling up Gaussian process models on large multidimensional arrays

Shandian Zhe; Yuan Qi; Youngja Park; Zenglin Xu; Ian Molloy; Suresh Chari

AAAI 2016

Conference paper

12 Feb 2016

DinTucker: Scaling up Gaussian process models on large multidimensional arrays

Abstract

Tensor decomposition methods are effective tools for modelling multidimensional array data (i.e., tensors). Among them, nonparametric Bayesian models, such as Infinite Tucker Decomposition (InfTucker), are more powerful than multilinear factorization approaches, including Tucker and PARAFAC, and usually achieve better predictive performance. However, they are difficult to handle massive data due to a prohibitively high training cost. To address this limitation, we propose Distributed infinite Tucker (DINTUCKER), a new hierarchical Bayesian model that enables local learning of InfTucker on subarrays and global information integration from local results. We further develop a distributed stochastic gradient descent algorithm, coupled with variational inference for model estimation. In addition, the connection between DINTUCKER and InfTucker is revealed in terms of model evidence. Experiments demonstrate that DINTUCKER maintains the predictive accuracy of InfTucker and is scalable on massive data: On multidimensional arrays with billions of elements from two real-world applications, DINTUCKER achieves significantly higher prediction accuracy with less training time, compared with the state-of-The-Art large-scale tensor decomposition method, GigaTensor.

Conference paper