Data Mining and Knowledge Discovery

SONIC: streaming overlapping community detection

View publication


A community within a graph can be broadly defined as a set of vertices that exhibit high cohesiveness (relatively high number of edges within the set) and low conductance (relatively low number of edges leaving the set). Community detection is a fundamental graph processing analytic that can be applied to several application domains, including social networks. In this context, communities are often overlapping, as a person can be involved in more than one community (e.g., friends, and family); and evolving, since the structure of the network changes. We address the problem of streaming overlapping community detection, where the goal is to maintain communities in the presence of streaming updates. This way, the communities can be updated more efficiently. To this end, we introduce SONIC—a find-and-merge type of community detection algorithm that can efficiently handle streaming updates. SONIC first detects when graph updates yield significant community changes. Upon the detection, it updates the communities via an incremental merge procedure. The SONIC algorithm incorporates two additional techniques to speed-up the incremental merge; min-hashing and inverted indexes. Results show that SONIC can provide high quality overlapping communities, while handling streaming updates several orders of magnitude faster than the alternatives performing from-scratch computation.