Graph indexing: Tree + Delta &amp;gt;= Graph

Peixiang Zhao; Jeffrey Xu Yu; Philip S. Yu

VLDB 2007

Conference paper

23 Sep 2007

Graph indexing: Tree + Delta >= Graph

Abstract

Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query graph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of graphs in G and the nature of complexity for subgraph isomorphism testing, it is desirable to make use of high-quality graph indexing mechanisms to reduce the overall query processing cost. In this paper, we propose a new cost-effective graph indexing method based on frequent tree-features of the graph database. We analyze the effectiveness and efficiency of tree as indexing feature from three critical aspects: feature size, feature selection cost, and pruning power. In order to achieve better pruning ability than existing graph-based indexing methods, we select, in addition to frequent tree-features (Tree), a small number of discriminative graphs (Δ) on demand, without a costly graph mining process beforehand. Our study verifies that (Tree+Δ) is a better choice than graph for indexing purpose, denoted (Tree+Δ ≥Graph), to address the graph containment query problem. It has two implications: (1) the index construction by (Tree+Δ) is efficient, and (2) the graph containment query processing by (Tree+Δ) is efficient. Our experimental studies demonstrate that (Tree+Δ) has a compact index structure, achieves an order of magnitude better performance in index construction, and most importantly, outperforms up-to-date graph-based indexing methods: gIndex and C-Tree, in graph containment query processing.

Paper