A. Aaron, S. Chen, et al.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Previous work addressing the issue of word distribution in documents has shown the importance of word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency of words to repeat within a document to separate the words with similar document frequencies, but different topic discriminating characteristics. We describe the application of the new measure in query-document relevance scoring. Experiments on the TREC Ad Hoc and Spoken Document Retrieval tasks show useful performance improvements.
A. Aaron, S. Chen, et al.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
S. Dharanipragada, Martin Franz, et al.
INTERSPEECH - Eurospeech 1999
Y. Al-Onaizan, R. Florian, et al.
NAACL-HLT 2003
S. Dharanipragada, Martin Franz, et al.
ICSLP 2000