About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
VLDB 2000
Conference paper
What is the nearest neighbor in high dimensional spaces?
Abstract
Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show. however, the problem is a very difficult one, not only with regards to the performance issue hut also to the quality issue. In this paper. we discuss the quality issue and identif a new generalized notion of nearest neighbor search as the relevant problem in high dimensional space. In contrast to previous approaches, Our new notion of nearest neighbor search does not treat all dimensions equally l)ut uses a quality criterion to select relevant dimensions (projections) with respect to the given query. As an example for a useful quality criterion, we rate how well the data is clustered around the query point within the selected projection. We then propose an efficient and effective algorithm to solve the generalized nearest neighbor problem. Our experiments based on a number of real and synthetic data sets show that our new approach provides new insights into the nature of nearest neighbor search on high dimensional data.