Delta-simrank computing on mapreduce

Liangliang Cao; Hyun Duk Kim; Min-Hsuan Tsai; Brian Cho; Zhen Li; Indranil Gupta

doi:10.1145/2351316.2351321

KDD 2012

Workshop paper

28 Sep 2012

Delta-simrank computing on mapreduce

View publication

Abstract

Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1. Copyright © 2012 ACM.

Workshop paper