WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from WikipediaYufang HouAlessandra Pascaleet al.2024NeurIPS 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAIAmbrish RawatStefan Schoepfet al.2024NeurIPS 2024
Language Models in Dialogue: Conversational Maxims for Human-AI InteractionsErik MiehlingManish Nagireddyet al.2024EMNLP 2024
Cookie Consent Has Disparate Impact on Estimation AccuracyErik MiehlingRahul Nairet al.2023NeurIPS 2023
Explaining knock-on effects of bias mitigationSvetoslav NizhnichenkovRahul Nairet al.2023NeurIPS 2023
AIMEE: An Exploratory Study of How Rules Support AI Developers to Explain and Edit ModelsDavid PiorkowskiInge Vejsbjerget al.2023PACM HCI