Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias BenchmarksClara Higuera CabañesRyo Iwakiet al.2024NeurIPS 2024
Prompt Templates: A Methodology for Improving Manual Red Teaming PerformanceBrandon DominiqueDavid Piorkowskiet al.2024CHI 2024
Influence Based Approaches to Algorithmic Fairness: A Closer LookSoumya GhoshPrasanna Sattigeriet al.2023NeurIPS 2023
Simulating Iterative Human-AI Interaction in Programming with LLMsHussein MozannarValerie Chenet al.2023NeurIPS 2023
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teamingManish NagireddyMichael Fefferet al.2025AAAI 2025
Language Models in Dialogue: Conversational Maxims for Human-AI InteractionsErik MiehlingManish Nagireddyet al.2024EMNLP 2024
DARE to Diversify: DAta Driven and Diverse LLM REd TeamingManish NagireddyBernat Guillen Pegueroleset al.2024KDD 2024