Synthetic Data for Evaluation: Supporting LLM-as-a-Judge Workflows with EvalAssistElizabeth DalyErik Miehlinget al.2025EMNLP 2025
Hide or Highlight: Understanding the Impact of Factuality Expression on User TrustHyo Jin DoWerner Geyer2025AIES 2025
Highlight All the Phrases: Enhancing LLM Transparency through Visual Factuality IndicatorsHyo Jin DoRachel Ostrandet al.2025AIES 2025
EvalAssist: Insights on Task-Specific Evaluations and AI-assisted Judgement Strategy PreferencesZahra AshktorabMichael Desmondet al.2025UIST 2025
Multi-Level Explanations for Generative Language ModelsLucas Monteiro PaesDennis Weiet al.2025ACL 2025
NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional ReasoningZheyuan ZhangYiyang Liet al.2025ACL 2025
A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software DevelopmentWerner GeyerJessica Heet al.2025CHIWORK 2025
Building Appropriate Mental Models: What Users Know and Want to Know about an Agentic AI ChatbotMichelle BrachmanSiya Kundeet al.2025IUI 2025