About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICSLP 2000
Conference paper
Decision tree based rate of speech modeling for speech recognition
Abstract
A real-world speech recognition system encounters several speaking styles and speaking rates and its accuracy depends highly on the speaking rate, i.e., degrades sharply with very fast or very slow speech (including hyperarticulated speech) In this paper, we propose a generic modeling scheme to capture a range of speaking rates from very slow to very fast with the use of decision trees. This approach improves recognition performance on fast and slow speech, without degrading the performance on normal speech. The main idea behind this scheme is to model the context-dependent HMM state likelihoods differently for different speaking rates as the joint probability of observing the sequence of durations given the sequence of the acoustic states, without having to rely on any explicit duration computation during run-time.