About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Theoretical Computer Science
Paper
The black-box complexity of nearest-neighbor search
Abstract
We define a natural notion of efficiency for approximate nearest-neighbor (ANN) search in general n-point metric spaces, namely the existence of a randomized algorithm which answers (1+ε)-ANN queries in polylog(n) time using only polynomial space. We then study which families of metric spaces admit efficient ANN schemes in the black-box model, where only oracle access to the distance function is given, and any query consistent with the triangle inequality may be asked. For ε<25, we offer a complete answer to this problem. Using the notion of metric dimension defined in [A. Gupta, et al., Bounded geometries, fractals, and low-distortion embeddings, in: 44th Annu. IEEE Symp. on Foundations of Computer Science, 2003, pp. 534-543] (ǎ la [P. Assouad, Plongements lipschitziens dans Rn, Bull. Soc. Math. France 111 (4) (1983) 429-448]), we show that a metric space X admits an efficient (1+ε)-ANN scheme for any ε<25 if and only if dim(X)=O(loglogn). For coarser approximations, clearly the upper bound continues to hold, but there is a threshold at which our lower bound breaks down - this is precisely when points in the "ambient space" may begin to affect the complexity of "hard" subspaces S⊆X. Indeed, we give examples which show that dim(X) does not characterize the black-box complexity of ANN above the threshold. Our scheme for ANN in low-dimensional metric spaces is the first to yield efficient algorithms without relying on any additional assumptions on the input. In previous approaches (e.g., [K.L. Clarkson, Nearest neighbor queries in metric spaces, Discrete Comput. Geom. 22(1) (1999) 63-93; D. Karger, M. Ruhl, Finding nearest neighbors in growth-restricted metrics, in: 34th Annu. ACM Symp. on the Theory of Computing, 2002, pp. 63-66; R. Krauthgamer, J.R. Lee, Navigating nets: simple algorithms for proximity search, in: 15th Annu. ACM-SIAM Symp. on Discrete Algorithms, 2004, pp. 791-801; K. Hildrum, et al., A note on finding nearest neighbors in growth-restricted metrics, in: Proc. of the 15th Annu. ACM-SIAM Symp. on Discrete Algorithms, 2004, pp. 560-561]), even spaces with dim(X)=O(1) sometimes required Ω(n) query times. © 2005 Elsevier B.V. All rights reserved.