Categorization using semi-supervised clustering

Jianying Hu; Moninder Singh; Aleksandra Mojsilovic

doi:10.1109/icpr.2008.4761253

ICPR 2008

Conference paper

08 Dec 2008

Categorization using semi-supervised clustering

View publication

Abstract

Many applications require matching objects to a predefined, yet highly dynamic set of categories accompanied by category descriptions. We present a novel approach to solving this class of categorization problems by formulating it in a semi-supervised clustering framework. Text-based matching is performed to generate "soft" seeds, which are then used to guide clustering in the basic feature space. We introduce a new variation of the k-means algorithm, called Soft Seeded k-means, which can effectively incorporate seeds that are of varying degrees of confidence, while allowing for incomplete coverage of the pre-defined categories. The algorithm is applied to real-world data from a business analytics application, and we demonstrate that it leads to superior performance compared to previous approaches. © 2008 IEEE.

Conference paper