# Fast approximation of matrix coherence and statistical leverage

## Abstract

The statistical leverage scores of a matrix A are the scuared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These cuantities are of interest in recently-popular problems such as matrix completion and Nystr om-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n > d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd2) time recuired by the naive algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-scuares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with and the extension to streaming environments. © 2012 Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney and David P. Woodruff.