Interleaving a Join Sequence with Semijoins in Distributed Query Processing
Abstract
In distributed query processing, the conventional approach to reduce the amount of data transmission is to first apply a sequence of semijoins as “reducers” and then ship the resultant relations to the final site to carry out the join operations. Recently, it has been shown that the approach of applying a combination of joins and semijoins as reducers can lead to substantially larger reduction on data transmission required. In this paper, we develop an efficient heuristic approach to determine an effective sequence of semijoin and join reducers. Semijoins whose execution will reduce the amount of data transmission required to perform a join sequence are termed beneficial semijoins for that join sequence. Note that beneficial semijoins include the conventional profitable semijoins and the gainful semijoins that are not profitable themselves but become beneficial due to the inclusion of join reducers. This type of dependency between semijoin and join reducers complicates the identification of beneficial semijoins and the ordering in the reducer sequence. We first obtain a sequence of join reducers and map it into a join sequence tree. In light of the join sequence tree, we derive important properties of beneficial semijoins. These properties are then applied to develop an efficient algorithm to determine the beneficial semijoins which can be inserted into the join sequence. Our results show that the approach of interleaving a join sequence with beneficial semijoins are not only efficient but also effective in reducing the total amount of data transmission required to process distributed queries. © 1992 IEEE