About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
KDD 2002
Conference paper
A new two-phase sampling based algorithm for discovering association rules
Abstract
This paper introduces FAST, a novel two-phase sampling-based algorithm for discovering association rules in large databases. In Phase I a large initial sample of transactions is collected and used to quickly and accurately estimate the support of each individual item in the database. In Phase II these estimated supports are used to either trim "outlier" transactions or select "representative" transactions from the initial sample, thereby forming a small final sample that more accurately reflects the statistical characteristics (i.e., itemset supports) of the entire database. The expensive operation of discovering association rules is then performed on the final sample. In an empirical study, FAST was able to achieve 90-95% accuracy using a final sample having a size of only 15-33% of that of a comparable random sample. This efficiency gain resulted in a speedup by roughly a factor of 10 over previous algorithms that require expensive processing of the entire database - even efficient algorithms that exploit sampling. Our new sampling technique can be used in conjunction with almost any standard association-rule algorithm, and can potentially render scalable other algorithms that mine "count" data.