About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE Transactions on Knowledge and Data Engineering
Paper
Combining join and semi-join operations for distributed query processing
Abstract
In this paper, we explore the approach to applying a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing. Specifically, we identify and exploit two important concepts which occur with the use of join operations as reducers in query processing, namely, gainful semi-joins and pure join attributes. Some semi-joins, though not profitable themselves, may benefit the execution of subsequent join operations and become profitable owing to the use of join operations as reducers. Such a semi-join is termed a gainful semi-join. In addition, join attributes which are not part of the output attributes are referred to as pure join attributes. We shall not only exploit the usefulness of gainful semi-joins, but also utilize the removability of pure join attributes to reduce the amount of data transmission required for query processing. Moreover, in light of the two concepts, heuristic searches are developed to determine a sequence of join and semi-join reducers for query processing. Our results show the importance of the approach to combining joins and semi-joins for distributed query processing.