CoP: Agentic Red-teaming for Large Language Models using Composition of PrinciplesChen XiongPin-Yu Chenet al.2025NeurIPS 2025
Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate ExpertsAndrea PugnanaRiccardo Massiddaet al.2025NeurIPS 2025
BenchmarkCards: Standardized Documentation for Large Language Model BenchmarksAnna SokolElizabeth Dalyet al.2025NeurIPS 2025
Optimal Estimation of the Best Mean in Multi-Armed BanditsTakayuki OsogamiJunya Hondaet al.2025NeurIPS 2025
Causal LLM Routing: End-to-End Regret Minimization from Observational DataAsterios TsiourvasWei Sunet al.2025NeurIPS 2025
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based MethodsDennis WeiInkit Padhiet al.2025NeurIPS 2025
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree SearchYanbo WangZixiang Xuet al.2025NeurIPS 2025