AI Testing
We’re designing tools to help ensure that AI systems are trustworthy, reliable and can optimize business processes. We create tests to simulate real-life scenarios and localize the faults in AI systems. We’re working on automating testing, debugging, and repairing AI models across a wide range of scenarios.
Our work
ASTER: Natural and multi-language unit test generation with LLMs
Technical noteRangeet Pan, Rahul Krishna, Raju Pavuluri, and Saurabh SinhaTiny benchmarks for large language models
NewsKim MartineauWhat is red teaming for generative AI?
ExplainerKim MartineauAn open-source toolkit for debugging AI models of all data types
Technical noteKevin Eykholt and Taesung LeeAI diffusion models can be tricked into generating manipulated images
NewsKim MartineauDOFramework: A testing framework for decision optimization model learners
Technical noteOrit Davidovich- See more of our work on AI Testing
Publications
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In
- Itay Nakash
- George Kour
- et al.
- 2025
- NAACL 2025
Exploring Straightforward Methods for Automatic Conversational Red-Teaming
- George Kour
- Naama Zwerdling
- et al.
- 2025
- NAACL 2025
ASTER: Natural and Multi-language Unit Test Generation with LLMs
- Rangeet Pan
- Myeongsoo Kim
- et al.
- 2025
- ICSE 2025
Workshop on Neuro-Symbolic Software Engineering
- Christian Medeiros Adriano
- Sona Ghahremani
- et al.
- 2025
- ICSE 2025
Combinatorial Test Design Model Creation using Large Language Models
- Debbie Furman
- Eitan Farchi
- et al.
- 2025
- IWCT 2025
Evolution of catalysis at IBM: From microelectronics to biomedicine to sustainability with AI-driven innovation
- James Hedrick
- Tim Erdmann
- et al.
- 2025
- ACS Spring 2025