On designing context sensitive language models for spoken dialog systems
Abstract
In this paper we describe our approach to building dialog context sensitive language models for improving the recognition performance in spoken dialog systems. These methods were developed and successfully tested in the context of a large-scale commercially deployed system that takes in over ten million calls each month. Dialog sensitive language models are typically built by clustering dialog histories into groups that elicit similar responses. A key innovation in this paper is to use an EM clustering procedure in lieu of a k-means clustering procedure that is typically used. The EM procedure results in clusters with higher log-probability which we argue leads to better recognition performance. Additionally, we empirically observe that the EM approach has much better worst case behavior than the k-means approach as it pertains to local optima.