Two-stage hashing for fast document retrieval

Hao Li; Wei Liu; Heng Ji

doi:10.3115/v1/p14-2081

ACL 2014

Conference paper

22 Jun 2014

Two-stage hashing for fast document retrieval

View publication

Abstract

This work fulfills sublinear time Nearest Neighbor Search (NNS) in massivescale document collections. The primary contribution is to propose a two-stage unsupervised hashing framework which harmoniously integrates two state-of-theart hashing algorithms Locality Sensitive Hashing (LSH) and Iterative Quantization (ITQ). LSH accounts for neighbor candidate pruning, while ITQ provides an efficient and effective reranking over the neighbor pool captured by LSH. Furthermore, the proposed hashing framework capitalizes on both term and topic similarity among documents, leading to precise document retrieval. The experimental results convincingly show that our hashing based document retrieval approach well approximates the conventional Information Retrieval (IR) method in terms of retrieving semantically similar documents, and meanwhile achieves a speedup of over one order of magnitude in query time. © 2014 Association for Computational Linguistics.

Conference paper