GIN: A clustering model for capturing dual heterogeneity in networked data
Abstract
Networked data often consists of interconnected multi-typed nodes and links. A common assumption behind such heterogeneity is the shared clustering structure. However, existing network clustering approaches oversimplify the heterogeneity by either treating nodes or links in a homogeneous fashion, resulting in massive loss of information. In addition, these studies are more or less restricted to specific network schémas or applications, losing generality. In this paper, we introduce a flexible model to explain the process of forming heterogeneous links based on shared clustering information of heterogeneous nodes. Specifically, we categorize the link generation process into binary and weighted cases and model them respectively. We show these two cases can be seamlessly integrated into a unified model. We propose to maximize a joint log-likelihood function to infer the model efficiently with Expectation Maximization (EM) algorithms. Experiments on real-world networked data sets demonstrate the effectiveness and flexibility of the proposed method in fully capturing the dual heterogeneity of both nodes and links.