Conference paper

Manifold-Aligned Counterfactual Explanations for Neural Networks


We study the problem of finding optimal manifold-aligned counterfactual explanations for neural networks. Existing approaches that involve solving a complex mixed-integer optimization (MIP) problem frequently suffer from scalability issues, limiting their practical usefulness. Furthermore, the solutions are not guaranteed to follow the data manifold, resulting in unrealistic counterfactual explanations. To address these challenges, we first present an MIP formulation where we explicitly enforce manifold alignment by re-formulating the highly nonlinear Local Outlier Factor (LOF) metric as mixed-integer constraints. To address the computational challenge, we leverage the geometry of a trained neural network and propose an efficient decomposition scheme that reduces the initial large, hard-to-solve optimization problem into a series of significantly smaller, easier-to-solve problems by constraining the search space to “live” polytopes, i.e., regions that contain at least one actual data point. Experiments on real-world datasets demonstrate the efficacy of our approach in producing both optimal and realistic counterfactual explanations, as well as computational tractability.