View all topics

AI Testing

We’re designing tools to help ensure that AI systems are trustworthy, reliable and can optimize business processes. We create tests to simulate real-life scenarios and localize the faults in AI systems. We’re working on automating testing, debugging, and repairing AI models across a wide range of scenarios.

Our work

ASTER: Natural and multi-language unit test generation with LLMs
Technical note
Rangeet Pan, Rahul Krishna, Raju Pavuluri, and Saurabh Sinha
30 Apr 2025
- AI
- AI Testing
Tiny benchmarks for large language models
News
Kim Martineau
03 Jun 2024
What is red teaming for generative AI?
Explainer
Kim Martineau
11 Apr 2024
An open-source toolkit for debugging AI models of all data types
Technical note
Kevin Eykholt and Taesung Lee
08 Sep 2023
AI diffusion models can be tricked into generating manipulated images
News
Kim Martineau
05 Jun 2023
DOFramework: A testing framework for decision optimization model learners
Technical note
Orit Davidovich
02 Feb 2023
See more of our work on AI Testing

Publications

Agentic Process Observability: Discovering Behavioral Variability
- - Fabiana Fournier
  - Lior Limonad
  - et al.
- 2025
- ECAI 2025
Exposing AI Bias by Crowdsourcing: Democratizing Critique of Large Language Models
- - Hangzhi Guo
  - Pranav Venkit
  - et al.
- 2025
- AIES 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
- - Giacomo Camposampiero
  - Michael Hersche
  - et al.
- 2025
- NeSy 2025
StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation
- - Satyananda Kashyap
  - Sola Shirai
  - et al.
- 2025
- VLDB 2025
Evaluating LLM-based Agents: Foundations, Best Practices and Open Challenges
- - Roy Bar-Haim
  - Arman Cohan
  - et al.
- 2025
- IJCAI 2025
JuStRank: Benchmarking LLM Judges for System Ranking
- - Ariel Gera
  - Odellia Boni
  - et al.
- 2025
- ACL 2025

View all publications

Featured post

Ensuring trustworthy AI through testing

Read the blog