Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior
Saliency methods - techniques to identify the importance of input features on a model's output - are a common step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: metrics for comparing model reasoning (via saliency) to human reasoning (via ground truth annotations). By providing quantitative descriptors, Shared Interest enables ranking, sorting, and aggregating inputs, thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior, such as cases where contextual features or a subset of ground truth features are most important to the model. Working with representative real-world users, we show how Shared Interest can be used to decide if a model is trustworthy, uncover issues missed in manual analyses, and enable interactive probing.
This paper has been awarded a Best of CHI Honorable Mention at ACM CHI ’22 (CHI Conference on Human Factors in Computing Systems). We (MIT and IBM researchers) introduce Shared Interest: a method for comparing model saliencies with human-generated ground truth annotations.
As machine learning continues to be deployed in real-world applications, it is increasingly important to understand the reasoning behind model decisions. A common first step for doing so is to compute the model's saliency. In this setting, saliency is the output of any function that, given an input instance (e.g., an image), computes a score representing the importance of each input feature (e.g., pixel) to the model's output.
While saliency methods provide the much-needed ability to inspect model behavior, making sense of their output can still present analysts with a non-trivial burden. In particular, saliencies are often visualized as solitary heatmaps, which do not provide any additional structure or higher-level visual abstractions to aid analysts in interpretation. As a result, analysts must rely solely on their visual perception and priors to generate hypotheses about model behavior. Similarly, saliency methods operate on individual instances, making it difficult to conduct large-scale analyses of model behavior and uncover recurring patterns. As a result, analysts must choose between time-consuming (often infeasible) manual analysis of all instances or ad hoc (often biased) selection of meaningful subsets of instances.
The Shared Interest approach presented in our paper quantifies the alignment between these two components by measuring three types of coverage: Ground Truth Coverage (GTC), or the proportion of ground truth features identified by the saliency method; Saliency Coverage (SC), or the proportion of saliency features that are also ground truth features; and IoU Coverage (IoU), the similarity between the saliency and ground truth feature sets. These coverage metrics enable a richer and more structured interactive analysis process by allowing analysts to sort, rank, and aggregate input instances based on model behavior.
The IBM Research and MIT approach and results
We introduce three metrics that measure the relationship between saliency and ground truth annotations. Mathematically, we use S to represent the set of input features important for a model's decision as determined by a saliency method and G to represent the set of input features important to a human's decision as indicated by a ground truth annotation. For example, in a CV classification task, G might represent the pixels within an object-level bounding box, and S might represent the set of pixels salient to the model's decision as determined by a saliency method. Using these features we compute three metrics: IoU Coverage (IoU), Ground Truth Coverage (GTC), and Saliency Coverage (SC).
Utilizing these three metrics, we identified recurring patterns of specific model behavior when we applying them to inputs of a specific model. For example, a high IoU score indicates the explanation and ground truth feature sets are very similar (IoU=1 implies S = G), meaning the features that are critical to human reasoning are also important to the model's decision. Correctly classified instances with high IoU scores indicate the model was correct in ways that tightly align with human reasoning. Incorrectly classified instances with high IoU scores, on the other hand, are often challenging for the model, such as the image of a snowplowing truck that is labeled as snowplow but predicted as pickup.
We demonstrate how Shared Interest can be used for real-world analysis through case studies of three interactive interpretability workflows of deep learning models. The first case study follows a domain expert (a dermatologist) using Shared Interest to determine the trustworthiness of a melanoma prediction model. The second case study follows a machine learning expert analyzing the faithfulness of their model and saliency method. The final case study examines how Shared Interest can analyze model behavior even without pre-existing ground truth annotations.
We developed visual prototypes for each case study to make the Shared Interest method explorable and accessible to all users, regardless of machine learning background. The computer vision and natural language processing prototypes focus on sorting and ranking input instances so users can examine model behavior.
Each input instance (image or review) is annotated with its ground truth features (highlighted in yellow) and its saliency features (highlighted in orange) and is shown alongside its Shared Interest scores, label, and prediction. The interface enables sorting and filtering based on Shared Interest score, Shared Interest case, label, and prediction. The human annotation interface is designed for interactive probing. The interface enables users to select and annotate an image with a ground truth region and returns the top classes with the highest Shared Interest scores for that ground truth region.
Shared Interest opens the door to several promising directions for future work. One straightforward path is applying Shared Interest to tabular data - a standard format used to train models, particularly in healthcare applications. Tabular data is often more semantically complex than image or text data and thus allows us to bring further nuance to the recurring behavior patterns we have identified in this paper. Another avenue for future work is using Shared Interest to compare the fidelity of different saliency methods.
Shared Interest started as an internship of Angie Boggust (MIT) at IBM Research and continued as collaboration between the Visual AI Lab at IBM and the Visualization Group at MIT supported by the MIT-IBM Watson AI lab.
Human-Centered AI Research @ CHI 2022
IBM Research participants presented recent advances related to our Human-Centered AI research agenda including four full papers, two co-organized workshop, one co-organized SIG, and two workshop papers. We have a diverse array of contributions focusing on the many different areas of human-computer interaction and data visualization.
Explore all our contributions at CHI 2022