About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
CIKM 1995
Conference paper
Efficient parallel data mining for association rules
Abstract
In this paper, we develop an algorithm, called PDM, to conduct parallel data mining for association rules. Consider a transaction as a collection of items, and a large itemset is a set of items such that the number of transactions containing it exceeds a pre-specified threshold. PDM is so designed that the global set of large itemsets can be identified efficiently and the amount of inter-node data exchange required is minimized. Specifically, with a given database partition, each processing node will collect (count) information on each itemset from its local database efficiently via a hashing method. The information discovered by each node is next shared with other nodes via some communication schemes. Then, PDM employs a technique, called clue-and-poll, to address the uncertainty due to the partial knowledge collected at each node by judiciously selecting a small fraction of the itemsets for the exchange of count information among nodes, thus reducing the communication cost. The global set of large itemsets can hence be determined based on the aggregate count of itemsets. It is experimentally shown that PDM not only attains very good parallelization efficiencies, but also provides robust performance for various input patterns.