ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding

Vishal Sunder; Eric Fosler-Lussier; Samuel Thomas; Hong-Kwang J. Kuo; Brian Kingsbury

doi:10.21437/Interspeech.2023-2018

INTERSPEECH 2023

Conference paper

20 Aug 2023

ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding

View publication

Abstract

Dialog history enhances downstream classification performance in both speech and text based dialog systems. However, there still exists a gap in dialog history integration in a fully end-to-end (E2E) spoken dialog system (SDS) versus a textual dialog system. Text-based dialog systems use large language models (LLMs) to encode long-range dependencies by attending to the entire conversation as a contiguous token sequence. This is not possible in an E2E SDS, as speech sequences can be intractably long. We propose a convolution subsampling approach to make the speech sequence of a conversation tractable and use a conformer to attend to the speech-based conversation in a fine-grained manner. This model is further enhanced via a conversation-level knowledge transfer from a LLM using a token-level alignment strategy. Finetuning the E2E model pretrained this way gives significant gains, of up to 8%, over strong non-contextual baselines in the E2E dialog act classification task on two datasets.

Paper