About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2010
Conference paper
Speaking rate adaptation using continuous frame rate normalization
Abstract
This paper describes a speaking rate adaptation technique for automatic speech recognition. The technique aims to reduce speaking rate variations by applying temporal warping in front-end processing so that the average phone duration in terms of feature frames remains constant. Speaking rate estimates are given by timing information from unadapted decoding outputs. We implement the proposed continuous frame rate normalization (CFRN) technique on a state-of-the-art speech recognition architecture, and evaluate it on the most recent GALE broadcast transcription tasks. Results show that CFRN gives consistent improvement on all four separate systems and two different languages. In fact, the reported numbers represent the best decoding error rates of the corresponding test sets. It is further shown that the technique is effective without retraining, and adds little overhead to the multi-pass recognition pipeline found in state-of-the-art transcription systems. ©2010 IEEE.