About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2014
Conference paper
Improvements to filterbank and delta learning within a deep neural network framework
Abstract
Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN. © 2014 IEEE.