Distributed query optimization: An engineering approach
We present a novel preprocessing technique for distributed query optimization which achieves nearly all of the benefits that full database reduction by semijoin achieves, but for a small fraction of the cost. Most previous researchers have approached the problem of distributed query optimization with the idea of achieving maximum database reduction at any cost. We take an engineering approach-that, past a point of diminishing returns, database reduction is not cost-effective. We introduce a technique called 'partial reduction', which uses a sequence of 'bucket semijoins' to eliminate nearly all of the irrelevant tuples eliminated by a fully reducing sequence of semijoins. We provide a performance analysis for a simple binary semijoin compared to a bucket semijoin, and show that a bucket semijoin can achieve, say, 90% of the reduction that a semijoin can, but for, say, 10% of the cost. Although we used some simplifying assumptions in our analysis, we argue that relaxing these assumptions does not diminish our results in any way. We discuss the sensitivity of our technique to its parameters, and we show that our results improve for multiple join and inequality join queries.