Heavy Sets with Applications to Interpretable Machine Learning Diagnostics
Abstract
ML models take on a new life after deployment and raise a host of new challenges: data drift, model recalibration and monitoring. If performance erodes over time, engineers in charge may ask what changed - did the data distribution change, did the model get worse after retraining? We propose a flexible paradigm for answering a variety of model diagnosis questions by finding heaviest-weight interpretable regions, which we call heavy sets. We associate a local weight describing model mismatch at each datapoint, and find a simple region maximizing the sum (or average) of these weights. Specific choices of weights can find regions where two models differ the most, where a single model makes unusually many errors, or where two datasets have large differences in densities. The premise is that a region with overall elevated errors (weights) may discover statistically significant effects despite individual errors not standing out in the noise. We focus on interpretable regions defined by sparse AND-rules (conjunctive rules using a small subset of available features). We first describe an exact integer programming (IP) formulation applicable to smaller datasets. As the exact IP is NP-hard, we develop a greedy coordinate-wise dynamic-programming based formulation. For smaller datasets the heuristic often comes close to the IP in objective, but it can scale to datasets with millions of examples and thousands of features. We also address statistical significance of the detected regions, taking care of multiple hypothesis testing and spatial dependence challenges that arise in model diagnostics. We evaluate our proposed approach both on synthetic data (with known ground-truth), and on well-known public ML datasets.