Building a Foundational Guardrail for General Agentic Systems via Synthetic DataYue HuangHang Huaet al.2026ICLR 2026Conference paper
When in Doubt, Cascade: Towards Building Efficient and Capable GuardrailsManish NagireddyInkit Padhiet al.2025AIES 2025Conference paper
Granite Guardian: Comprehensive LLM SafeguardingInkit PadhiManish Nagireddyet al.2025NAACL 2025Conference paper
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teamingManish NagireddyMichael Fefferet al.2025NAACL 2025Tutorial
Programming Refusal with Conditional Activation SteeringBruce LeeInkit Padhiet al.2025ICLR 2025Conference paper
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teamingManish NagireddyMichael Fefferet al.2025AAAI 2025Tutorial
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias BenchmarksClara Higuera CabañesRyo Iwakiet al.2024NeurIPS 2024Workshop paper
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024Workshop paper
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024EMNLP 2024Conference paper
Language Models in Dialogue: Conversational Maxims for Human-AI InteractionsErik MiehlingManish Nagireddyet al.2024EMNLP 2024Paper