Publication
CIKM 1995
Conference paper
Extensible classifier for semi-structured documents
Abstract
In this paper, we present a vector space classifier for determining the type of semi-structured documents. Our goal was to design a high-performance classifier in terms of accuracy (recall and precision), speed, and flexibility. The ability to dynamically extend a classifier with user-specific classes is crucial for many applications. Unfortunately, the training data of existing classes is often not available, such that the extended classifier is imprecise as a result. We focus on this issue. First, we evaluate how to create class abstracts that can be used as training data replacement. Second, we introduce relevance feedback learning strategies to overcoming the remaining classifier flaw.