About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SIGMOD/PODS/ 2010
Conference paper
Fast Manhattan sketches in data streams
Abstract
The ℓ1-distance, also known as the Manhattan or taxicab distance, between two vectors x, y in ℝn is ∑ i=1n|xi-yi|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with O*(ε-2) space and O*(1) update time. The O* notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(ε-3) space or Ω(ε-2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O*(1) factors. © 2010 ACM.