Publication
ICPR 2008
Conference paper

Categorization using semi-supervised clustering

View publication

Abstract

Many applications require matching objects to a predefined, yet highly dynamic set of categories accompanied by category descriptions. We present a novel approach to solving this class of categorization problems by formulating it in a semi-supervised clustering framework. Text-based matching is performed to generate "soft" seeds, which are then used to guide clustering in the basic feature space. We introduce a new variation of the k-means algorithm, called Soft Seeded k-means, which can effectively incorporate seeds that are of varying degrees of confidence, while allowing for incomplete coverage of the pre-defined categories. The algorithm is applied to real-world data from a business analytics application, and we demonstrate that it leads to superior performance compared to previous approaches. © 2008 IEEE.

Date

Publication

ICPR 2008

Authors

Topics

Share