About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICSLP 1998
Conference paper
WAVELET-BASED ENERGY BINNING CEPSTRAL FEATURES FOR AUTOMATIC SPEECH RECOGNITION
Abstract
Speech production models, coding methods as well as text to speech technology often lead to the introduction of modulation models to represent speech signals with primary components which are amplitude-and-phase-modulated sine functions. Parallelisms between properties of the wavelet transform of primary components and algorithmic representations of speech signals derived from auditory nerve models like the EIH lead to the introduction of synchrosqueezing measures. On the other hand, in automatic speech (and speaker) recognition, cepstral feature have imposed themselves quasi-universally as acoustic characteristic of speech utterances. This paper analyses cepstral representation in the context of the synchrosqueezed representation - wastrum. It discusses energy accumulation derived wastra as opposed to classical MEL and LPC derived cepstra. In the former method the primary components and formants play a primary role. Recognition results are presented on the Wall Street Journal database using IBM continuous decoder.