Virtual documents and answer priors in keyword search over data graphs
Abstract
In keyword search over data graphs, an answer is a nonredundant subtree that contains the keywords of the query. Ranking of answers should take into account both their textual relevance and the significance of their semantic structure. A novel method for answers priors is developed and used in conjunction with query-dependent features. Since the space of all possible answers is huge, efficiency is also a major problem. A new algorithm that drastically cuts down the search space is presented. It generates candidate answers by first selecting top-n roots and top-n nodes for each query keyword. The selection is by means of a novel concept of virtual documents with weighted term frequencies. Markov random field models are used for ranking the virtual documents and then the generated answers. The proposed approach outperforms existing systems on a standard evaluation framework.