MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation SystemsYannis KatsisSara Rosenthalet al.2025ACL 2025
InspectorRAGet: An Introspection Platform for RAG EvaluationBenjamin SznajderKshitij Fadniset al.2025NAACL 2025
Diagnosing and Prioritizing Issues in Automated Order-Taking Systems: A Machine-Assisted Error Discovery ApproachMaeda HanafiFrederick Reisset al.2025CHI 2025
Creating Conversational Datasets for Retrieval-Augmented Generation Applications is Hard: Challenges & Research OpportunitiesMaeda HanafiKshitij Fadniset al.2025CHI 2025
Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science WorkflowsJasmine ShihVishal Mohantyet al.2024CHI 2024
Machine-Assisted Error Discovery in Conversational AI SystemsMaeda HanafiFrederick Reisset al.2024CHI 2024
Zero-shot Topical Text Classification with LLMs - an Experimental StudyAvishai GretzAlon Halfonet al.2023EMNLP 2023
Label Sleuth: From Unlabeled Text to a Classifier in a Few HoursEyal ShnarchAlon Halfonet al.2022EMNLP 2022
Knowledge-augmented Risk Assessment (KaRA): a hybrid-intelligence framework for supporting knowledge-intensive risk assessment of prospect candidatesMaeda HanafiYannis Katsiset al.2022EMNLP 2022
InteractEva: A Simulation-Based Evaluation Framework for Interactive AI SystemsYannis KatsisMaeda Hanafiet al.2022AAAI 2022