SIGIR Forum (ACM Special Interest Group on Information Retrieval)

Automatic query refinement using lexical affinities with maximal information gain


This work describes an automatic query refinement technique, which focuses on improving precision of the top ranked documents. The terms used for refinement are lexical affinities (LAs), pairs of closely related words which contain exactly one of the original query terms. Adding these terms to the query is equivalent to re-ranking search results, thus, precision is improved while recall is preserved. We describe a novel method that selects the most "informative" LAs for refinement, namely, those LAs that best separate relevant documents from irrelevant documents in the set of results. The information gain of candidate LAs is determined using unsupervised estimation that is based on the scoring function of the search engine. This method is thus fully automatic and its quality depends on the quality of the scoring function. Experiments we conducted with TREC data clearly show a significant improvement in the precision of the top ranked documents.