Photonics East 1998
Conference paper

CSVD: Approximate similarity searches in high dimensional spaces using clustering and singular value decomposition

View publication


Many data-intensive applications, such as content-based retrieval of images or video from multimedia databases and similarity retrieval of patterns in data mining, require the ability of efficiently performing similarity queries. Unfortunately, the performance of nearest neighbor (NN) algorithms, the basis for similarity search, quickly deteriorates with the number of dimensions. In this paper we propose a method called Clustering with Singular Value Decomposition (CSVD), combining clustering and singular value decomposition (SVD) to reduce the number of index dimensions. With CSVD, points are grouped into clusters that are more amenable to dimensionality reduction than the original dataset. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than SVD alone for the same fraction of total variance preserved. Conversely, for the same compression ratio CSVD results in an increase in preserved total variance with respect to SVD (e.g., a 70% increase for a 20:1 compression ratio). Then, approximate NN queries are more efficiently processed, as quantified through experimental results.