Publication
SIGMOD/PODS/ 2010
Conference paper

Fast Manhattan sketches in data streams

View publication

Abstract

The ℓ1-distance, also known as the Manhattan or taxicab distance, between two vectors x, y in ℝn is ∑ i=1n|xi-yi|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with O*(ε-2) space and O*(1) update time. The O* notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(ε-3) space or Ω(ε-2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O*(1) factors. © 2010 ACM.

Date

23 Jul 2010

Publication

SIGMOD/PODS/ 2010

Authors

Share