On classifier behavior in the presence of mislabeling noise
Machine learning algorithms perform differently in settings with varying levels of training set mislabeling noise. Therefore, the choice of the right algorithm for a particular learning problem is crucial. The contribution of this paper is towards two, dual problems: first, comparing algorithm behavior; and second, choosing learning algorithms for noisy settings. We present the “sigmoid rule” framework, which can be used to choose the most appropriate learning algorithm depending on the properties of noise in a classification problem. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the characteristics of the sigmoid function using five representative non-sequential classifiers, namely, Naïve Bayes, kNN, SVM, a decision tree classifier, and a rule-based classifier, and three widely used sequential classifiers based on hidden Markov models, conditional random fields and recursive neural networks. Based on the sigmoid parameters we define a set of intuitive criteria that are useful for comparing the behavior of learning algorithms in the presence of noise. Furthermore, we show that there is a connection between these parameters and the characteristics of the underlying dataset, showing that we can estimate an expected performance over a dataset regardless of the underlying algorithm. The framework is applicable to concept drift scenarios, including modeling user behavior over time, and mining of noisy time series of evolving nature.