Jiaqi Zhang, Chandler Squires, et al.
ICML 2023
There is a rich and growing literature on producing local contrastive/counterfactual explanations for black-box models (e.g. neural networks). In these methods, for an input, an explanation is in the form of a contrast point differing in very few features from the original input and lying in a different class. Other works try to build globally interpretable models like decision trees and rule lists based on the data using actual labels or based on the black-box models predictions. Although these interpretable global models can be useful, they may not be consistent with local explanations from a specific black-box of choice. In this work, we explore the question: Can we produce a transparent global model that is simultaneously accurate and consistent with the local (contrastive) explanations of the black-box model? We introduce a local consistency metric that quantifies if the local explanations for the black-box model are also applicable to the proxy/surrogate globally transparent model. Based on a key insight we propose a novel method where we create custom boolean features from local contrastive explanations of the black-box model and then train a globally transparent model that has higher local consistency compared with other known strategies in addition to being accurate.
Jiaqi Zhang, Chandler Squires, et al.
ICML 2023
Djallel Bouneffouf, Charu Aggarwal, et al.
IJCNN 2020
Balaji Ganesan, Srinivas Parkala, et al.
NeurIPS 2020
Sahil Suneja, Yufan Zhuang, et al.
EuroS&P 2023