About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Journal of Computer and System Sciences
Paper
Selectivity and cost estimation for joins based on random sampling
Abstract
We compare the performance of sampling-based procedures for estimating the selectivity of a join. While some of the procedures have been proposed in the database literature, their relative performance has never been analyzed. A main result of this paper is a partial ordering that compares the variability of the estimators for the different procedures after an arbitrary fixed number of sampling steps. Prior to the current work, it was also unknown whether these fixed-step procedures could be extended to fixed-precision procedures that are both asymptotically consistent and asymptotically efficient. Our second main result is a general method for such an extension and a proof that the method is valid for all the procedures under consideration. We show that, under plausible assumptions on sampling costs, the partial ordering of the fixed-step procedures with respect to variability of the selectivity estimator implies a partial ordering of the corresponding fixed-precision procedures with respect to sampling cost. Our final result is a collection of fixed-step and fixed-precision procedures for estimating the cost of processing a join query according to a fixed join plan. © 1996 Academic Press, Inc.