Multi-Level Explanations for Generative Language ModelsLucas Monteiro PaesDennis Weiet al.2025ACL 2025
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teamingManish NagireddyMichael Fefferet al.2025NAACL 2025
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teamingManish NagireddyMichael Fefferet al.2025AAAI 2025
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias BenchmarksClara Higuera CabañesRyo Iwakiet al.2024NeurIPS 2024
Language Models in Dialogue: Conversational Maxims for Human-AI InteractionsErik MiehlingManish Nagireddyet al.2024EMNLP 2024