About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICPR 2012
Conference paper
Business email classification using incremental subspace learning
Abstract
We consider a new text classification task: classifying enterprise email messages into sensitive business topics. The identification of sensitive topics in email messages is important for enterprises to safeguard their critical data such as intellectual properties and trade secrets. We introduce the incremental PCA (Principal Component Analysis) to email representation, which can learn a feature subspace incrementally and effectively to reduce the feature dimensionality. Linear SVM (Support Vector Machine) is then adopted to learn the classification models. We validate our approaches with 5,000 emails extracted from the Enron Email set. Experimental results show that SVM outperforms other classification methods, and the incremental PCA produces a substantial reduction in the processing time and a slight increase in the classification accuracy compared to SVM with all the features. © 2012 ICPR Org Committee.