Publication
NeurIPS 2022
Conference paper

Approximate Euclidean lengths and distances beyond Johnson-Lindenstrauss

Abstract

A classical result of Johnson and Lindenstrauss states that a set of n high dimensional data points can be projected down to O(log n/ε^2) dimensions such that the square of their pairwise distances will be preserved up to some small distortion ε ∈ (0,1). This work aims to improve this1/ε^2 dependency based on techniques inspired by the Hutch++ Algorithm [23], which, remarkably, reduced1/ε^2 to1/ε for the related problem of implicit matrix trace estimation. Forε = 0.01, for example, this translates to 100 times less matrix-vector products in the matrix-vector query model to achieve the same accuracy. We first present an algorithm for estimating the Euclidean lengths of the rows of a matrix for which we prove (i) element-wise probabilistic bounds which are at least as good as standard JL estimations in the worst case, but are asymptotically better for matrices with rapidly decaying spectrum and (ii) for any matrix, regardless its spectrum, the algorithm can achieve εaccuracy for the total, Frobenius norm-wise relative error using only O(1/ε) queries, which is a quadratic improvement over standard JL approximations. We show that these results can also be extended for two other important problems, namely for estimating the Euclidean distances between data points, as well as for approximating the statistical leverage scores of a tall-and-skinny data matrix. We also provide indicative numerical experiments validating our theoretical analysis.

Date

26 Nov 2022

Publication

NeurIPS 2022

Authors

Topics

Share