Using combination of join and semijoin operations for distributed query processing
Abstract
A combination of join and semijoin operations is applied to minimize the communication cost for distributed query processing. A formula is developed to estimate the cardinality of a relation resulting from join operations specified by a query graph. Two important concepts which occur with the use of join operations as reducers in query processing are studied and exploited, namely, gainful semijoins and pure join attributes. Some semijoins, though not profitable themselves, may benefit from the execution of subsequent join operations and become profitable owing to the use of join operations as reducers. Such a semijoin is termed a gainful semijoin. Also, join attributes which are not part of the output attributes are referred to as pure join attributes. A formula to estimate the cardinality of a relation resulting from a projection operation is derived. The results show the attractiveness of the approach of applying a combination of joins and semijoins as reducers to distributed query processing.