Tight bounds for distributed functional monitoring

David P. Woodruff; Qin Zhang

doi:10.1145/2213977.2214063

STOC 2012

Conference paper

26 Jun 2012

Tight bounds for distributed functional monitoring

View publication

Abstract

We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are k sites each tracking their input streams and communicating with a central coordinator. The coordinator's task is to continuously maintain an approximate output to a function computed over the union of the k streams. The goal is to minimize the number of bits communicated. Let the p-th frequency moment be defined as F p = Σ i f ip, where f i is the frequency of element i. We show the randomized communication complexity of estimating the number of distinct elements (that is, F 0) up to a 1+ε factor is Ω(k/ε 2), improving upon the previous Ω(k + 1/ε 2) bound and matching known upper bounds. For F p, p > 1, we improve the previous Ω(k + 1/ε 2) communication bound to Ω(k p-1/ε 2). We obtain similar improvements for heavy hitters, empirical entropy, and other problems. Our lower bounds are the first of any kind in distributed functional monitoring to depend on the product of k and 1/ε 2. Moreover, the lower bounds are for the static version of the distributed functional monitoring model where the coordinator only needs to compute the function at the time when all k input streams end; surprisingly they almost match what is achievable in the (dynamic version of) distributed functional monitoring model where the coordinator needs to keep track of the function continuously at any time step. We also show that we can estimate F p, for any p > 1, using Õ(k p-1 poly(ε -1)) communication. This drastically improves upon the previous Õ(k 2p+1N 1-2/p poly(ε -1)) bound of Cormode, Muthukrishnan, and Yi for general p, and their Õ(k 2/ε + k 1.5/ε 3) bound for p = 2. For p = 2, our bound resolves their main open question. Our lower bounds are based on new direct sum theorems for approximate majority, and yield improvements to classical problems in the standard data stream model. First, we improve the known lower bound for estimating F p, p > 2, in t passes from Ω(n 1-2/p/(ε 2/p t)) to Ω(n 1-2/p/(ε 4/p t)), giving the first bound that matches what we expect when p = 2 for any constant number of passes. Second, we give the first lower bound for estimating F 0 in t passes with Ω(1/(ε 2 t)) bits of space that does not use the hardness of the gap-hamming problem. © 2012 ACM.

Conference paper