Prospective Explanations: An Interactive Mechanism for Model Understanding
Abstract
We demonstrate a system for prospective explanations of black box models for regression and classification tasks with structured data. Prospective explanations are aimed at showing how models work by highlighting likely changes in model outcomes under changes in input. This in contrast to most post-hoc explanability methods, that aim to provide a justification for a decision retrospectively. Our system is designed to provide fast estimates of changes in outcomes for any arbitrary exploratory query from users. Such queries are typical partial, i.e. involve only a selected number of features, the outcomes labels are shown therefore as likelihoods. Repeated queries can therefore indicate which aspects of the feature space are more likely to influence the target variable. Fast interactive exploration is made possible by a surrogate Bayesian network model trained on model labels with some reasonable assumptions on architectures. The main advantages of our approach are that (a) inference is very fast and supports real-time feedback allowing for interactivity, (b) inference can be done with partial information on features, and (c) any indirect effects are also considered in estimating target class distributions.