An open-source toolkit for debugging AI models of all data typesTechnical noteKevin Eykholt and Taesung Lee08 Sep 2023Adversarial Robustness and PrivacyAI TestingData and AI Security
AI diffusion models can be tricked into generating manipulated imagesNewsKim Martineau05 Jun 2023AIAI TestingData and AI SecurityFoundation ModelsGenerative AISecurity
DOFramework: A testing framework for decision optimization model learnersTechnical noteOrit Davidovich02 Feb 2023AIAI TestingMathematical Sciences
Managing the risk in AI: Spotting the “unknown unknowns”ResearchOrna Raz, Sam Ackerman, and Marcel Zalmanovici06 Jun 20215 minute readAIAI Testing
IBM researchers check AI bias with counterfactual textResearchInkit Padhi, Nishtha Madaan, Naveen Panwar, and Diptikalyan Saha05 Feb 20215 minute readAI TestingFairness, Accountability, Transparency
Multivariate Stochastic Dominance via Optimal Transport and Applications to Models BenchmarkingGabriel RiouxApoorva Nitsureet al.2024NeurIPS 2024
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial ScenariosSamuel AckermanElla Rabinovichet al.2024EMNLP 2024
Towards a Benchmark for Causal Business Process Reasoning with LLMsFabiana FournierLior Limonadet al.2024BPM 2024
Data Contamination Report from the 2024 CONDA Shared TaskOscar SainzIker García-ferreroet al.2024ACL 2024