Practical ensemble classification error bounds for different operating points
Abstract
Classification algorithms used to support the decisions of human analysts are often used in settings in which zero-one loss is not the appropriate indication of performance. The zero-one loss corresponds to the operating point with equal costs for false alarms and missed detections, and no option for the classifier to leave uncertain test samples unlabeled. A generalization bound for ensemble classification at the standard operating point has been developed based on two interpretable properties of the ensemble: strength and correlation, using the Chebyshev inequality. Such generalization bounds for other operating points have not been developed previously and are developed in this paper. Significantly, the bounds are empirically shown to have much practical utility in determining optimal parameters for classification with a reject option, classification for ultralow probability of false alarm, and classification for ultralow probability of missed detection. Counter to the usual guideline of large strength and small correlation in the ensemble, different guidelines are recommended by the derived bounds in the ultralow false alarm and missed detection probability regimes. © 2013 IEEE.