Speech Communication

Leveraging Word Confusion Networks for Named Entity modeling and detection from Conversational Telephone Speech

View publication


Named Entity (NE) detection from Conversational Telephone Speech (CTS) is important from business aspects. However, results of Automatic Speech Recognition (ASR) inevitably contain errors and this makes NE detection from CTS more difficult than from written text. One of the options to detect NEs is to use a statistical NE model. In order to capture the nature of ASR errors, the NE model is usually trained with the ASR one-best results instead of manually transcribed text and then is applied to the ASR one-best results of speech that contain NEs. To make NE detection more robust to ASR errors, we propose using Word Confusion Networks (WCNs), sequences of bundled words, for both NE modeling and detection by regarding the word bundles as units instead of the independent words. We realize this by clustering similar word bundles that may originate from the same word. We trained the NE models that predict the NE tag sequences from the sequence of the word bundles with the maximum entropy principle. Note that clustering of word bundles is conducted in advance of NE modeling and thus our proposed method can combine with any NE modeling method. We conducted experiments using real-life call-center data. The experimental results showed that by using the WCNs, the accuracy of NE detection improved regardless of the NE modeling method. © 2011 Elsevier B.V. All rights reserved.