ConceptExplainer: Interactive Explanation for Deep Neural Networks from a Concept Perspective
Abstract
Traditional deep learning interpretability methods which are suitable for model users cannot explain network behaviors at the global level and are inflexible at providing fine-grained explanations. As a solution, concept-based explanations are gaining attention due to their human intuitiveness and their flexibility to describe both global and local model behaviors. Concepts are groups of similarly meaningful pixels that express a notion, embedded within the network's latent space and have commonly been hand-generated, but have recently been discovered by automated approaches. Unfortunately, the magnitude and diversity of discovered concepts makes it difficult to navigate and make sense of the concept space. Visual analytics can serve a valuable role in bridging these gaps by enabling structured navigation and exploration of the concept space to provide concept-based insights of model behavior to users. To this end, we design, develop, and validate CONCEPTEXPLAINER, a visual analytics system that enables people to interactively probe and explore the concept space to explain model behavior at the instance/class/global level. The system was developed via iterative prototyping to address a number of design challenges that model users face in interpreting the behavior of deep learning models. Via a rigorous user study, we validate how CONCEPTEXPLAINER supports these challenges. Likewise, we conduct a series of usage scenarios to demonstrate how the system supports the interactive analysis of model behavior across a variety of tasks and explanation granularities, such as identifying concepts that are important to classification, identifying bias in training data, and understanding how concepts can be shared across diverse and seemingly dissimilar classes.