Approximating the number of unique values of an attribute without sorting

Morton M. Astrahan; Mario Schkolnick; Whang Kyu-Young

doi:10.1016/0306-4379(87)90014-7

Information Systems

Paper

01 Jan 1987

Approximating the number of unique values of an attribute without sorting

View publication

Abstract

Counts of unique values are frequently needed information in database systems. Especially, they are essential in query optimization and physical database design. Traditionally, exact counts were obtained by sorting, which is an expensive operation. In this paper we present three algorithms for counting unique values by probabilistic methods. These algorithms require only one pass over the data, and produce approximations to the true count with certain standard deviations. For deviations acceptable in practical environments (~10%), the algorithms require only modest amounts of memory space and computation time. We have implemented all three algorithms in System R. We also present the results of the experiments on accuracy and performance of these algorithms. © 1987.

Conference paper