Publication
CyberC 2009
Conference paper

An efficient parallel FP-Growth algorithm

View publication

Abstract

FP-Growth algorithm recursively generates huge amounts of conditional pattern bases and conditional FP-trees when the dataset is huge. In such a case, both the memory usage and computational cost are expensive, such that, the FP-tree can not meet the memory requirement. In this work, we propose a novel parallel FP-Growth algorithm, which is designed to run on the computer cluster. To avoid memory overflow, this algorithm finds all the conditional pattern bases of frequent items by the projection method without constructing an FP-tree. Hereafter, it splits the mining task into number of independent sub-tasks, executes these sub-tasks in parallel on nodes and then aggregates the results back for the final result. Our algorithm works independently at each node. As a result, it can efficiently reduce the inter-node communication cost. Experiments show that this parallel algorithm not only avoids the memory overflow but accelerate the computational speed. In addition, it achieves much better scalability than that of the FP-Growth algorithm. © 2009 IEEE.

Date

Publication

CyberC 2009

Authors

Topics

Share