When in Doubt, Cascade: Towards Building Efficient and Capable GuardrailsManish NagireddyInkit Padhiet al.2025AIES 2025
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teamingManish NagireddyMichael Fefferet al.2025NAACL 2025
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teamingManish NagireddyMichael Fefferet al.2025AAAI 2025
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias BenchmarksClara Higuera CabañesRyo Iwakiet al.2024NeurIPS 2024
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024
Language Models in Dialogue: Conversational Maxims for Human-AI InteractionsErik MiehlingManish Nagireddyet al.2024EMNLP 2024
DARE to Diversify: DAta Driven and Diverse LLM REd TeamingManish NagireddyBernat Guillen Pegueroleset al.2024KDD 2024