Evaluating LLM-based Agents: Foundations, Best Practices and Open ChallengesRoy Bar-HaimArman Cohanet al.2025IJCAI 2025
Towards a Re-evaluation of Data Forging Attacks In PracticeMohamed SulimanAnisa Halimiet al.2025USENIX Security 2025
Property-driven localization and characterization in molecular representationsCelia CintasPayel Daset al.2025Nature Scientific Reports
Conceptual Diagnostics for Knowledge Graphs and Large Language ModelsRosario Uceda-SosaMaria Changet al.2025ACL 2025
Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMsMegh ThakkarQuentin Fournieret al.2025ACL 2025
Multi-Level Explanations for Generative Language ModelsLucas Monteiro PaesDennis Weiet al.2025ACL 2025
NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional ReasoningZheyuan ZhangYiyang Liet al.2025ACL 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical StudiesMassimiliano PronestiJoao Bettencourt-Silvaet al.2025ACL 2025
BI-Bench : A Comprehensive Benchmark Dataset and Unsupervised Evaluation for BI SystemsAnkush GuptaAniya Aggarwalet al.2025ACL 2025