Distributed execution of continuous queries

Rajeev Gupta; Krithi Ramamritham

doi:10.1109/ICDE.2014.6816767

ICDE 2014

Conference paper

31 Mar 2014

Distributed execution of continuous queries

View publication

Abstract

Data delivered over the internet is increasingly being used for providing dynamic and personalized user experiences. To achieve this, queries are executed over fast changing data from distributed sources. As these queries require data from multiple sources, these queries are executed at an intermediate proxy or data aggregator. Typically, users of these queries are not interested in all the data updates. Query results may be associated with an imprecision bound or threshold which can be used to limit the number of refresh messages. These queries can be categorized based on the types of results required: in an entity based query the user is just interested in knowing the ids of the data items (or entities) satisfying certain selection condition; in a value based query the user is interested in the value of some aggregation over distributed data items; and in a threshold query the user wants to know whether a Boolean condition, expressed as a threshold over an aggregation of data items, is true. We methodically present techniques for executing all these categories of continuous aggregation queries over distributed data so that the number of message exchanges between data sources, aggregators, and users is minimized. The value of individual data items can be uncertain with an associated probability. A data aggregator can execute the query either by getting all the required data or by sending appropriate sub-queries to the distributed data sources. For getting the data, the aggregator can use either push or pull based mechanisms. Each of these methods has different ways of minimizing the number of message exchanges. We present various algorithms for the same. © 2014 IEEE.

Conference paper