Importance-aware Bloom Filter for managing set membership queries on streaming data

Ravi Bhoraskar; Vijay Gabale; Purushottam Kulkarni; Dhananjay Kulkarni

doi:10.1109/COMSNETS.2013.6465564

COMSNETS 2013

Conference paper

13 Mar 2013

Importance-aware Bloom Filter for managing set membership queries on streaming data

View publication

Abstract

In this work1, we consider a set of networking applications which generate or process a continuous stream of data items, for example, a web-cache which processes a stream of web-objects. These applications often require to answer membership queries for duplicate detection on an unbounded set of data items. Two key challenges to answer such membership queries are the limited space to store the entire stream and the different importance-levels associated with different data items. For instance, a web-caches are of finite sizes and the cost of fetching objects into the cache is proportional to the size of objects. Motivated by such examples, our work focuses on developing a time and space efficient indexing and membership query scheme which takes into account importance levels of objects in a data stream. We propose Importance-aware Bloom Filter (IBF) which provides a set of insertion and deletion algorithms to handle membership queries on a data stream. Our evaluation of IBF for a synthetic as well as a real data set of a stream of Youtube videos, shows that IBF has close to 0% error in answering membership queries on important data items, and it results in 4-fold better performance in comparison to other importance-agnostic Bloom filter-based schemes. Importantly, we also find properties of IBF analytically via a Markov model based analysis. Thus, we believe, IBF provides a practical framework to balance the application-specific requirements to index and query data streams based on the data semantics. © 2013 IEEE.

Conference paper