Final-Model-Only Data Attribution with a Unifying View of Gradient-Based MethodsDennis WeiInkit Padhiet al.2025NeurIPS 2025Conference paper
When in Doubt, Cascade: Towards Building Efficient and Capable GuardrailsManish NagireddyInkit Padhiet al.2025AIES 2025Conference paper
Granite Guardian: Comprehensive LLM SafeguardingInkit PadhiManish Nagireddyet al.2025NAACL 2025Conference paper
Programming Refusal with Conditional Activation SteeringBruce LeeInkit Padhiet al.2025ICLR 2025Conference paper
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from WikipediaYufang HouAlessandra Pascaleet al.2024NeurIPS 2024Conference paper
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based MethodsDennis WeiInkit Padhiet al.2024NeurIPS 2024Workshop paper
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024Workshop paper
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024EMNLP 2024Conference paper
ComVas: Contextual Moral Values Alignment SystemInkit PadhiPierre Dogninet al.2024IJCAI 2024Conference paper