Mixed Bandwidth Acoustic Modeling Leveraging Knowledge Distillation
Training of mixed bandwidth acoustic models have recently been realized by incorporating special Mel filterbanks. To fit information into every filterbank bin available across both narrowband and wideband data, these filterbanks pad zeros at high frequency ranges of narrowband data. Although these methods succeed in decreasing word error rates (WER) on broadband data, they fail to improve on narrowband signals. In this paper, we propose methods to mitigate these effects with generalized knowledge distillation. In our method, specialized teacher networks are first trained on lossless acoustic features with full scale Mel filterbanks. While training student networks, privileged knowledge from these teacher networks is then used to compensate for missing information at high frequencies introduced by the special Mel filterbanks. We show the benefit of the proposed technique for both narrowband (10% relative WER improvement) and wideband data (7.5% relative WER improvement) on the Aurora 4 task over traditional methods.