Clustering and Singular Value Decomposition for Approximate Indexing in High Dimensional Spaces

Alexander Thomasian; Vittorio Castelli; Chung-Sheng Li

doi:10.1145/288627.288658

CIKM 1998

Workshop paper

03 Nov 1998

Clustering and Singular Value Decomposition for Approximate Indexing in High Dimensional Spaces

View publication

Abstract

High-dimensionality indexing of feature spaces is critical for many data-intensive applications such as content-based retrieval of images or video from multimedia databases and similarity retrieval of patterns in data mining. Unfortunately, the performance of nearest neighbor (NN) queries, which are required for similarity search, deteriorates rapidly with the increase in the number of dimensions. We propose the Clustering with Singular Value Decomposition (CSVD) method, which combines clustering and singular value decomposition (SVD) to reduce the number of index dimensions, while maintaining a reasonably high precision for a given value of recall. In the proposed CSVD method, homogeneous points are grouped into clusters such that the points in each cluster are more amenable to dimensionality reduction than the original dataset. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than SVD for the same fraction of total variance preserved. Conversely, for the same compression ratio CSVD results in an increase in preserved total variance with respect to SVD (e.g., a 70% increase for a 20 compression ratio). This translates to a higher efficiency in processing approximate NN queries, as quantified through experimental results.

Conference paper