Combining global and personal anti-spam filtering

Richard Segal

CEAS 2007

Conference paper

02 Aug 2007

Combining global and personal anti-spam filtering

Abstract

Many of the first successful applications of statistical learning to anti-spam filtering were personalized classifiers that were trained on an individual user's spam and ham e-mail. Proponents of personalized filters argue that statistical text learning is effective because it can identify the unique aspects of each individual's e-mail. On the other hand, a single classifier learned for a large population of users can leverage the data provided by each individual user across hundreds or even thousands of users. This paper investigates the trade-off between globally-and personallytrained anti-spam classifiers. We find that globally-trained text classification easily outperforms personally-trained classification under realistic settings. This result does not imply that personalization is not valuable. We show that the two techniques can be combined to produce a modest improvement in overall performance.

Conference paper