Selectivity estimation for extraction operators over text data
Daisy Zhe Wang, Long Wei, et al.
ICDE 2011
This paper explores exploiting the synergy between document clustering and phrasal analysis for the purpose of automatically constructing a context-based retrieval system. A context consists of two components - a cluster of logically related articles (its extension) and a small set of salient concepts, represented by words and phrases and organized by the cluster's key terms (its intension). At run-time, the system presents contexts that best match the result list of a user's natural language query. The user can then choose a context and manipulate the intensional component to both browse the context's extension and launch new searches over the entire database. We argue that the focused relevance feedback provided by contexts, at a level of abstraction higher than individual documents and lower than the database as a whole, provides a natural way for users to refine vague information needs and helps to blur the distinction between searching and browsing. The Paraphrase interface, running over a database of business-related news articles, is used to illustrate the advantages of such a context-based retrieval paradigm. Copyright 1997 ACM.
Daisy Zhe Wang, Long Wei, et al.
ICDE 2011
Shivakumar Vaithyanathan, Byron Dom
NeurIPS 1999
Doug Burdick, Prasad M. Deshpande, et al.
VLDB 2006
Yunyao Li, Rajasekar Krishnamurthy, et al.
EMNLP 2008