About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH - Eurospeech 2003
Conference paper
An architecture for rapid decoding of large vocabulary conversational speech
Abstract
This paper addresses the question of howto design a large vocabulary recognition system so that it can simultaneously handle a sophisticated language model, perform state-of-the-art speaker adaptation, and run in one times real time1 (1×RT). The architecture we propose is based on classical HMMViterbi decoding, but uses an extremely fast initial speaker-independent decoding to estimate VTL warp factors, feature-space andmodel-spaceMLLR transformations that are used in a final speaker-adapted decoding. We present results on past Switchboard evaluation data that indicate that this strategy compares favorably to published unlimited-time systems (running in several hundred times real-time). Coincidentally, this is the system that IBM fielded in the 2003 EARS Rich Transcription evaluation.