About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
CIKM 1998
Workshop paper
Clustering and Singular Value Decomposition for Approximate Indexing in High Dimensional Spaces
Abstract
High-dimensionality indexing of feature spaces is critical for many data-intensive applications such as content-based retrieval of images or video from multimedia databases and similarity retrieval of patterns in data mining. Unfortunately, the performance of nearest neighbor (NN) queries, which are required for similarity search, deteriorates rapidly with the increase in the number of dimensions. We propose the Clustering with Singular Value Decomposition (CSVD) method, which combines clustering and singular value decomposition (SVD) to reduce the number of index dimensions, while maintaining a reasonably high precision for a given value of recall. In the proposed CSVD method, homogeneous points are grouped into clusters such that the points in each cluster are more amenable to dimensionality reduction than the original dataset. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than SVD for the same fraction of total variance preserved. Conversely, for the same compression ratio CSVD results in an increase in preserved total variance with respect to SVD (e.g., a 70% increase for a 20:1 compression ratio). This translates to a higher efficiency in processing approximate NN queries, as quantified through experimental results.