View all topics

Explainable AI

To trust AI systems, explanations can go a long way. We’re creating tools to help debug AI, where systems can explain what they’re doing. This includes training highly optimized, directly interpretable models, as well as explanations of black-box models and visualizations of neural network information flows.

Our work

Debugging LLMs to improve their credibility
Research
Kim Martineau
30 Jul 2025
Teaching AI models to improve themselves
Research
Peter Hess
14 Aug 2024
IBM and RPI researchers demystify in-context learning in large language models
News
Peter Hess
25 Jul 2024
The latest AI safety method is a throwback to our maritime past
Research
Kim Martineau
16 Nov 2023
Find and fix IT glitches before they crash the system
News
Kim Martineau
28 Sep 2023
What is retrieval-augmented generation?
Explainer
Kim Martineau
22 Aug 2023
See more of our work on Explainable AI

Publications

Toward a Coherent Virtual Cell Model: Probing Biological World-Model Coherence in Transcriptomic Foundation Models
- - Noa Moriel
  - Yishai Shimoni
  - et al.
- 2025
- NeurIPS 2025
Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts
- - Andrea Pugnana
  - Riccardo Massidda
  - et al.
- 2025
- NeurIPS 2025
Specifying exact circuit algorithms in universal transformers
- - Taku Ito
  - Ruchir Puri
  - et al.
- 2025
- NeurIPS 2025
Multi-Domain Explainability of Preferences
- - Nitay Calderon
  - Liat Ein-Dor
  - et al.
- 2025
- EMNLP 2025
XABPs: Towards eXplainable Autonomous Business Processes
- - Peter Fettke
  - Fabiana Fournier
  - et al.
- 2025
- ECAI 2025
Highlight All the Phrases: Enhancing LLM Transparency through Visual Factuality Indicators
- - Hyo Jin Do
  - Rachel Ostrand
  - et al.
- 2025
- AIES 2025

View all publications