Publications

87 results for Amit Dhurandhar

Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
- - Dennis Wei
  - Inkit Padhi
  - et al.
- 2025
- NeurIPS 2025
Representation Similarity Reveals Implicit Layer Grouping in Neural Networks
- - Tian Gao
  - Amit Dhurandhar
  - et al.
- 2025
- NeurIPS 2025
LLM ethics benchmark: a three-dimensional assessment system for evaluating moral reasoning in large language models
- - Junfeng Jiao
  - Saleh Afroogh
  - et al.
- 2025
- Scientific Reports
Multi-Level Explanations for Generative Language Models
- - Lucas Monteiro Paes
  - Dennis Wei
  - et al.
- 2025
- ACL 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
- - Ivoline Ngong
  - Swanand Ravindra Kadhe
  - et al.
- 2025
- ACL 2025
Programming Refusal with Conditional Activation Steering
- - Bruce Lee
  - Inkit Padhi
  - et al.
- 2025
- ICLR 2025
PROGRAMMING REFUSAL WITH CONDITIONAL ACTIVATION STEERING
- - Bruce W. Lee
  - Inkit Padhi
  - et al.
- 2025
- ICLR 2025
Large Language Model Confidence Estimation via Black-Box Access
- - Tejaswini Pedapati
  - Amit Dhurandhar
  - et al.
- 2025
- TMLR
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
- - Ivoline Ngong
  - Swanand Ravindra Kadhe
  - et al.
- 2024
- NeurIPS 2024
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
- - Dennis Wei
  - Inkit Padhi
  - et al.
- 2024
- NeurIPS 2024