The class imbalance problem

Fadel Megahed; Ying-Ju Chen; Aly Megahed; Yuya Jeremy Ong; Naomi Altman; Martin Krzywinski

doi:10.1038/s41592-021-01302-4

Nature Methods

Paper

15 Oct 2021

The class imbalance problem

View publication

Abstract

We previously discussed how classifiers based on logistic regression and decision trees can be used for predicting the class of an observation. Unfortunately, when such classifiers are trained on a dataset in which one of the response classes is rare, they can underestimate the probability of observing a rare event — the greater the imbalance, the greater this small-sample bias. This month, we illustrate how to mitigate the negative effect of class imbalance on the training of classifiers.

Conference paper