Publication
SIGMOD/PODS/ 1994
Conference paper
On the relative cost of sampling for join selectivity estimation
Abstract
We compare the cost of estimating the selectivity of a 'star join' using sampling procedure t-cross to the cost of simply computing the join and obtaining the exact answer. Our bounds and approximation for the relative cost of sampling show how this cost depends on the size of the input relations, the number of input relations, and the precision criterion used by the estimation procedure. We also demonstrate the deleterious effect of dangling tuples and the mixed effect of data skew on the relative cost of sampling. These results provide insight into when sampling should or should not be used for join selectivity estimation.