Black-Box k-to-1-PCA Reductions: Theory and Applications

Arun Jambulapati; Syamantak Kumar; Jerry Li; Shourya Pandey; Ankit Pensia; Kevin Tian

COLT 2024

Conference paper

30 Jun 2024

Black-Box k-to-1-PCA Reductions: Theory and Applications

Abstract

The k-principal component analysis (k-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of k-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing k-PCA algorithms, where we model access to the unknown target matrix via a black-box 1-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to k-PCA algorithm design, such black-box methods, which recursively call a 1-PCA oracle k times, were previously poorly-understood. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for k-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, k-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant k. We apply our framework to obtain state-of-the-art k-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a poly(k) factor.

Paper