# Optimal principal component analysis in distributed and streaming models

## Abstract

This paper studies the Principal Component Analysis (PCA) problem in the distributed and streaming models of computation. Given a matrix A ∈ ℝm×n, a rank parameter k < rank(A), and an accuracy parameter 0 < ϵ < 1, we want to output an m × k orthonormal matrix U for which (Equation Presented) where Aκ ∈ ℝm×n is the best rank-k approximation to A. Our contributions are summarized as follows: 1. In the arbitrary partition distributed model of Kannan et al. (COLT 2014), each of s machines holds a matrix A1 and A = Σsi=1 Ai. Each machine should output U. Kannan et al. achieve O(skm/ϵ) + poly(sk/ϵ) words (of O(log(nm)) bits) communication. We obtain the improved bound of O (skm)+ poly(sk/ϵ) words, and show an optimal (up to low order terms) Ω(skm) lower bound. This resolves an open question in the literature. A poly(ϵ-1) dependence is known to be required, but we separate this dependence from m. 2. In a more specific distributed model where each server receives a subset of columns of A, we bypass the above lower bound when A is φ-sparse in each column. Here we obtain an O(skφ/ϵ) + poly(sk/ϵ) word protocol. Our communication is independent of the matrix dimensions, and achieves the guarantee that each server, in addition to outputting U, outputs a subset of O(k/ϵ) columns of A containing a U in its span (that is, for the first time, we solve distributed column subset selection). Additionally, we show a matching Ω(skφ/ϵ) lower bound for distributed column subset selection. Achieving our communication bound when A is sparse in general but not sparse in each column, is impossible. 3. In the streaming model of computation, in which the columns of the matrix A arrive one at a time, an algorithm of Liberty (KDD, 2013) with an improved analysis by Ghashami and Phillips (SODA, 2014) achieves O(km/ϵ) "real numbers" space complexity. We improve this result, since our one-pass streaming PCA algorithm achieves an O(km/ϵ) + poly(k/ϵ) word space upper bound. This almost matches a known Ω(km/ϵ) bit lower bound of Woodruff (NIPS, 2014). We show that with two passes over the columns of A one can achieve an O(km) + poly(k/ϵ) word space upper bound; another lower bound of Woodruff (NIPS, 2014) shows that this is optimal for any constant number of passes (up to the poly (k/ϵ) term and the distinction between words versus bits). 4. Finally, in turnstile streams, in which we receive entries of A one at a time in an arbitrary order, we describe an algorithm with O((m + n)kϵ-1) words of space. This improves the O((m + n)ϵ-2)kϵ-2) upper bound of Clarkson and Woodruff (STOC 2009), and matches their Ω((m + n)kϵ-1) word lower bound. Notably, our results do not depend on the condition number or any singular value gaps of A.