SDM 2016
Conference paper

RelSim: Relation similarity search in schema-rich heterogeneous information networks

Download paper


Recent studies have demonstrated the power of modeling real world data as heterogeneous information networks (HINs) consisting of multiple types of entities and relations. Unfortunately, most of such studies (e.g., similarity search) confine discussions on the networks with only a few entity and relationship types, such as DBLP. In the real world, however, the network schema can be rather complex, such as Freebase. In such HINs with rich schema, it is often too much burden to ask users to provide explicit guidance in selecting relation-s for similarity search. In this paper, we study the problem of relation similarity search in schema-rich HINs. Under our problem setting, users are only asked to provide some simple relation instance examples (e.g., (Barack Obama, John Kerry) and (George W. Bush, Condoleezza Rice)) as a query, and we automatically detect the latent semantic relation (L-SR) implied by the query (e.g., "president vs. secretary-of-state"). Such LSR will help to find other similar relation instances (e.g., (Bill Clinton, Madeleine Albright)). In order to solve the problem, we first define a new meta-path-based relation similarity measure, RelSim, to measure the similarity between relation instances in schema-rich HINs. Then given a query, we propose an optimization model to efficiently learn LSR implied in the query through linear programming, and perform fast relation similarity search using RelSim based on the learned LSR. The experiments on real world datasets derived from Freebase demonstrate the effectiveness and efficiency of our approach.


05 May 2016


SDM 2016