The fast cauchy transform and faster robust linear regression

Kenneth L. Clarkson; Petros Drineas; Malik Magdon-Ismail; Michael W. Mahoney; Xiangrui Meng; David P. Woodruff

doi:10.1137/1.9781611973105.34

SODA 2013

Conference paper

06 Jan 2013

The fast cauchy transform and faster robust linear regression

View publication

Abstract

We provide fast algorithms for overconstrained ℓp regression and related problems: for an n x d input matrix A and vector b ∈ ℝn, in O(nd log n) time we reduce the problem min x∈ℝd |Ax - b|p to the same problem with input matrix Ã of dimension s x d and corresponding b̃ of dimension s x 1. Here, Ã and b̃ are a coreset for the problem, consisting of sampled and rescaled rows of A and b; and s is independent of n and polynomial in d. Our results improve on the best previous algorithms when n ≫ d, for all p ∈ [1, ∞) except p = 2; in particular, they improve the O(nd1.376+) running time of Sohler and Woodruff (STOC, 2011) for p = 1, that uses asymptotically fast matrix multiplication, and the O(nd 5 log n) time of Dasgupta et al. (SICOMP, 2009) for general p, that uses ellipsoidal rounding. We also provide a suite of improved results for finding well-conditioned bases via ellipsoidal rounding, illustrating tradeoffs between running time and conditioning quality, including a one-pass conditioning algorithm for general ℓp problems. To complement this theory, we provide a detailed empirical evaluation of implementations of our algorithms for p = 1, comparing them with several related algorithms. Among other things, our empirical results clearly show that, in the asymptotic regime, the theory is a very good guide to the practical performance of these algorithms. Our algorithms use our faster constructions of well-conditioned bases for ℓp spaces and, for p = 1, a fast subspace embedding of independent interest that we call the Fast Cauchy Transform: a matrix Π : ℝn → ℝO(d log d), found obliviously to A, that approximately preserves the ℓ1 norms: that is, |Ax| 1 ≈ |Π|1, for all x, with distortion O(d 2+η log d), for an arbitrarily small constant η > 0; and, moreover, ΠA can be computed in O(nd log d) time. The techniques underlying our Fast Cauchy Transform include fast Johnson-Lindenstrauss transforms, low-coherence matrices, and rescaling by Cauchy random variables. Copyright © SIAM.

Conference paper