The fast Cauchy transform and faster robust linear regression
Abstract
We provide fast algorithms for overconstrained ℓp regression and related problems: for an n × d input matrix A and vector b ∈ℝ n, in O(nd log n) time we reduce the problem minx∈ℝd ∥Ax-b∥p to the same problem with input matrix à of dimension s× d and corresponding b of dimension s×1. Here, à and b are a coreset for the problem, consisting of sampled and rescaled rows of A and b; and s is independent of n and polynomial in d. Our results improve on the best previous algorithms when n ≫ d for all p ∈[1, ∞) except p=2; in particular, they improve the O(nd1.376+) running time of Sohler and Woodruff [Proceedings of the 43rd Annual ACM Symposium on Theory of Computing, 2011, pp. 755-764] for p=1, which uses asymptotically fast matrix multiplication, and the O(nd5 log n) time of Dasgupta et al. [SIAM J. Comput., 38 (2009), pp. 2060- 2078] for general p, which uses ellipsoidal rounding. We also provide a suite of improved results for finding well-conditioned bases via ellipsoidal rounding, illustrating tradeoffs between running time and conditioning quality, including a one-pass conditioning algorithm for general ℓp problems. To complement this theory, we provide a detailed empirical evaluation of implementations of our algorithms for p=1, comparing them with several related algorithms. Among other things, our empirical results clearly show that, in the asymptotic regime, the theory is a very good guide to the practical performance of these algorithms. Our algorithms use our faster constructions of well-conditioned bases for ℓp spaces and, for p=1, a fast subspace embedding of independent interest that we call the Fast Cauchy transform: a distribution over matrices π: ℝn ? ℝO(d log d), found obliviously to A, that approximately preserves the ℓ1 norms, that is, with large probability, simultaneously for all x, ∥Ax∥1 ≈ ∥Ax∥1, with distortion O(d2+η), for an arbitrarily small constant η > 0; and, moreover, πA can be computed in O(nd log d) time. The techniques underlying our Fast Cauchy transform include Fast Johnson-Lindenstrauss transforms, low-coherence matrices, and rescaling by Cauchy random variables.