Initialization-independent spectral clustering with applications to automatic video analysis
Abstract
Popular clustering algorithms, such as K-Means (KM) and Expectation Maximization (EM), are sensitive to the initialization of cluster centers. In contrast, recently proposed K-Harmonic Means (KHM) algorithm is more robust to the randomness of the initialization, However, KHM works best when the dimensionality of the data (N) is small (usually less than 8). Because the dimensionality of features that are used for many clustering problems in image/video and speech processing is large, the benefits of KHM cannot be exploited. Based upon this observation, this paper proposes a novel method to employ KHM for high-dimensional data so as to realize initialization-independent clustering. The proposed method employs efficient spectral clustering techniques whereby the affinity matrix of the data is decomposed into its eigenvectors and k (total number of clusters) eigenvectors corresponding to the k largest eigenvalues are retained. That is, we represent N-D data by k-D transformed data when k < N and propose to employ KHM over this k-D transformed data. The assumption of k < N indeed encompasses a large number of significant video processing and computer vision problems where the use of KHM was not beneficial before. We demonstrate the effectiveness and the efficiency of the proposed algorithm for face clustering in the domains where the number of persons (k) is not large, such as anchorperson grouping, video-based speaker clustering in videoconferencing, and identity-based tracking in small office environments.