Conference paper
Short paper
Creating Conversational Datasets for Retrieval-Augmented Generation Applications is Hard: Challenges & Research Opportunities
Abstract
Retrieval-augmented generation (RAG) has been proven to help mitigate hallucinations from large language models (LLMs). However, as more domains adopt this method, the need for human-created conversational data increases, as human-created conversations are naturally driven better. Yet, in our experience, when we tasked several annotators to create such data for RAG applications, we learned that creating conversational data for RAG is hard, leading to cases where we had to reject 35% of data in some batches. In this paper, we interview a group of annotators to understand what makes this task challenging. We distill insights from the formative study and outline potential future directions in the intersection of RAG applications and HCI.
Related
Conference paper
Do not have enough data? Deep learning to the rescue!
Conference paper