About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2008
Conference paper
An empirical analysis of word error rate and keyword error rate
Abstract
This paper studies the relationship between word error rate (WER) and keyword error rate (KER) in speech transcripts and their effect on the performance of speech analytics applications. Automatic speech recognition (ASR) systems are increasingly used as input for speech analytics, which raises the question of whether WER or KER is the more suitable performance metric for calibrating the ASR system. ASR systems are typically evaluated in terms of WER. Many speech analytics applications, however, rely on identifying keywords in the transcripts - thus their performance can be expected to be more sensitive to keyword errors than regular word errors. To study this question, we conduct a case study using an experimental data set comprising 100 calls to a contact center. We first automatically extract domain-specific words from the manual transcription and use this set of words to calculate keyword error rates in the following experiments. We then generate call transcripts with the IBM Attila speech recognition system, using different training for each repetition to generate transcripts with a range of word error rates. The transcripts are then processed with two speech analytics applications, call section segmentation and topic categorization. The results show similar WER and KER in high-accuracy transcripts, but KER increases more rapidly than WER as the accuracy of the transcription deteriorates. Neither speech analytics application showed significant sensitivity to the increase in KER for low-accuracy transcripts. Thus this case study did not identify a significant difference between using WER and KER. Copyright © 2008 ISCA.