Navigating massive data sets via local clustering

Michael E. Houle

doi:10.1145/956750.956817

KDD 2003

Conference paper

01 Dec 2003

Navigating massive data sets via local clustering

View publication

Abstract

This paper introduces a scalable method for feature extraction and navigation of large data sets by means of local clustering, where clusters are modeled as overlapping neighborhoods. Under the model, intra-cluster association and external differentiation are both assessed in terms of a natural confidence measure. Minor clusters can be identified even when they appear in the intersection of larger clusters. Scalability of local clustering derives from recent generic techniques for efficient approximate similarity search. The cluster overlap structure gives rise to a hierarchy that can be navigated and queried by users. Experimental results are provided for two large text databases. Copyright 2003 ACM.

Conference paper