Variable Latent Semantic Indexing

Anirban Dasgupta; Prabhakar Raghavan; Ravi Kumar; Andrew Tomkins

doi:10.1145/1081870.1081876

KDD 2005

Conference paper

01 Dec 2005

Variable Latent Semantic Indexing

View publication

Abstract

Latent Semantic Indexing is a classical method to produce optimal low-rank approximations of a term-document matrix. However, in the context of a particular query distribution, the approximation thus produced need not be optimal. We propose VLSI, a new query-dependent (or "variable") low-rank approximation that minimizes approximation error for any specified query distribution. With this tool, it is possible to tailor the LSI technique to particular settings, often resulting in vastly improved approximations at much lower dimensionality. We validate this method via a series of experiments on classical corpora, showing that VLSI typically performs similarly to LSI with an order of magnitude fewer dimensions. Copyright 2005 ACM.

Paper