Publications

38 results for Inkit Padhi

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
- - Yue Huang
  - Hang Hua
  - et al.
- 2026
- ICLR 2026
Conference paper
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
- - Dennis Wei
  - Inkit Padhi
  - et al.
- 2025
- NeurIPS 2025
Conference paper
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
- - Manish Nagireddy
  - Inkit Padhi
  - et al.
- 2025
- AIES 2025
Conference paper
Granite Guardian: Comprehensive LLM Safeguarding
- - Inkit Padhi
  - Manish Nagireddy
  - et al.
- 2025
- NAACL 2025
Conference paper
Programming Refusal with Conditional Activation Steering
- - Bruce Lee
  - Inkit Padhi
  - et al.
- 2025
- ICLR 2025
Conference paper
Contextual Value Alignment
- - Kush Varshney
  - Miao Liu
  - et al.
- 2025
- ICASSP 2025
Conference paper
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
- - Yufang Hou
  - Alessandra Pascale
  - et al.
- 2024
- NeurIPS 2024
Conference paper
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
- - Dennis Wei
  - Inkit Padhi
  - et al.
- 2024
- NeurIPS 2024
Workshop paper
Value Alignment from Unstructured Text
- - Inkit Padhi
  - Karthikeyan Natesan Ramamurthy
  - et al.
- 2024
- NeurIPS 2024
Workshop paper
Value Alignment from Unstructured Text
- - Inkit Padhi
  - Karthikeyan Natesan Ramamurthy
  - et al.
- 2024
- EMNLP 2024
Conference paper