NeurIPS 2023
Workshop paper

Cost-Aware Counterfactuals for Black Box Explanations


Counterfactual explanations provide actionable insights into the minimal change in a system that would lead to a more desirable prediction from a black box model. We address the challenges of finding valid and low cost counterfactuals in the setting where there is a different cost or preference for perturbing each feature. We propose a multiplicative weight approach that is applied on the perturbation, and show that this simple approach can be easily adapted to obtain multiple diverse counterfactuals, as well as to integrate the importance features obtained by other state of the art explainers to provide counterfactual examples. Additionally, we discuss the computation of valid counterfactuals with numerical gradient-based methods when the black box model presents flat regions with no reliable gradient. In this scenario, sampling approaches, as well as those that rely on available data, sometimes provide counterfactuals that may not be close to the decision boundary. We show that a simple long-range guidance approach, which consist of sampling from a larger radius sphere in search of a direction of change for the black box predictor when no gradient is available, improves the quality of the counterfactual explanation. In this work we discuss existing approaches, and show how our proposed alternatives compares favorably on different datasets and metrics.