About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2019
Conference paper
Cyclegan Bandwidth Extension Acoustic Modeling for Automatic Speech Recognition
Abstract
Although narrowband (NB) and wideband (WB) speech data primarily differ in sampling rate, these two common input sources are difficult to simultaneously model for automatic speech recognition (ASR). Meanwhile, cycle consistent generative adversarial networks (CycleGANs) have been shown value in a number of acoustic tasks such as mapping between domains due to their powerful generators. We apply Cycle-GAN to the task of bandwidth extension (BWE) and test a variety of architectures. The CycleGANs produce encouraging losses and reconstructed spectrograms. In order to further reduce word error rates (WER) we add an additional discriminative loss to the CycleGAN BWE architecture. This more closely matches our ASR goal and we show gains in WER compared to a standard BWE model discriminatively trained only to map from upsampled narrowband (UNB) to WB data.