Learning interpretable classification rules using sequential rowsampling
In our previous work we have presented an approach to learn interpretable classification rules using a Boolean compressed sensing formulation. Our approach uses a linear programming (LP) relaxation and allows us to find interpretable (sparse) classification rules that achieve good generalization accuracy. However, the resulting LP representation for problems with either a large number of samples or large number of continuous features tends to become challenging for off-the-shelf LP solvers. We have explored a screening approach which allows us to dramatically reduce the number of active features without sacrificing optimality. In this work we explore reducing the number of samples in a sequential setting where we can certify reaching a near-optimal solution while only solving the LP on a small fraction of the available data points. In a batch setting this approach can dramatically reduce the computational complexity of the rule-learning LP formulation. In an online setting we derive stochastic upper and lower bounds on the the LP objective for unseen samples. This allows early stopping when we detect that the classifier will not change significantly with additional samples. The upper bounds are related to the learning curve literature in machine learning, and our lower bounds appear not to have been explored. Finally, we discuss a quick approach to compute the complete regularization path balancing rule interpretability versus accuracy.