Final-Model-Only Data Attribution with a Unifying View of Gradient-Based MethodsDennis WeiInkit Padhiet al.2025NeurIPS 2025
Representation Similarity Reveals Implicit Layer Grouping in Neural NetworksTian GaoAmit Dhurandharet al.2025NeurIPS 2025
LLM ethics benchmark: a three-dimensional assessment system for evaluating moral reasoning in large language modelsJunfeng JiaoSaleh Afrooghet al.2025Scientific Reports
Multi-Level Explanations for Generative Language ModelsLucas Monteiro PaesDennis Weiet al.2025ACL 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational AgentsIvoline NgongSwanand Ravindra Kadheet al.2025ACL 2025
Large Language Model Confidence Estimation via Black-Box AccessTejaswini PedapatiAmit Dhurandharet al.2025TMLR
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational AgentsIvoline NgongSwanand Ravindra Kadheet al.2024NeurIPS 2024
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based MethodsDennis WeiInkit Padhiet al.2024NeurIPS 2024