# BPTree: An ℓ_{2} heavy hitters algorithm using constant memory

## Abstract

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. One is given a list i1, i2, ⋯, im ϵ [n] and the goal is to identify the items among [n] that appear frequently in the list. In sub-polynomial space, the strongest guarantee available is the ℓ2 guarantee, which requires finding all items that occur at least ϵ||f||2 times in the stream, where the vector f ϵ Rn is the count histogram of the stream with ith coordinate equal to the number of times i appears fi:= #{j ϵ [m]: ij = i}. The first algorithm to achieve the ℓ2 guarantee was the CountSketch of [11], which requires O(ϵ-2 logn) words of memory and O (logn) update time and is known to be space-optimal if the stream allows for deletions. The recent work of [7] gave an improved algorithm for insertion-only streams, using only O(ϵ-2 logϵ-1 log logn) words of memory. In this work, we give an algorithm BPTree for ℓ2 heavy hitters in insertion-only streams that achieves O(ϵ-2 logϵ-1) words of memory and O(logϵ-) update time, which is the optimal dependence on n and m. In addition, we describe an algorithm for tracking ||f||2 at all times with O(ϵ-) memory and update time. Our analyses rely on bounding the expected supremum of a Bernoulli process involving Rademachers with limited independence, which we accom-plish via a Dudley-like chaining argument that may have applications elsewhere.