About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Information Systems
Paper
Approximating the number of unique values of an attribute without sorting
Abstract
Counts of unique values are frequently needed information in database systems. Especially, they are essential in query optimization and physical database design. Traditionally, exact counts were obtained by sorting, which is an expensive operation. In this paper we present three algorithms for counting unique values by probabilistic methods. These algorithms require only one pass over the data, and produce approximations to the true count with certain standard deviations. For deviations acceptable in practical environments (~10%), the algorithms require only modest amounts of memory space and computation time. We have implemented all three algorithms in System R. We also present the results of the experiments on accuracy and performance of these algorithms. © 1987.