Who Sees the Risk? Stakeholder Conflicts and Explanatory Policies in LLM-based Risk AssessmentSrishti YadavJasmina Gajcinet al.2026AAAI 2026Workshop paper
Synthetic Data for Evaluation: Supporting LLM-as-a-Judge Workflows with EvalAssistElizabeth DalyErik Miehlinget al.2025EMNLP 2025Demo paper
Localizing Persona Representations in LLMsCelia CintasMiriam Rateikeet al.2025AIES 2025Conference paper
Localizing Persona Representations in LLMsCelia CintasMiriam Rateikeet al.2025COLM 2025Workshop paper
Granite Guardian: Comprehensive LLM SafeguardingInkit PadhiManish Nagireddyet al.2025NAACL 2025Conference paper
Programming Refusal with Conditional Activation SteeringBruce LeeInkit Padhiet al.2025ICLR 2025Conference paper
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAIAmbrish RawatStefan Schoepfet al.2024NeurIPS 2024Workshop
Language Models in Dialogue: Conversational Maxims for Human-AI InteractionsErik MiehlingManish Nagireddyet al.2024EMNLP 2024Paper