Publication
INTERSPEECH 2022
Conference paper

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Download paper

Abstract

We introduce two techniques, length perturbation and n-best based label smoothing, in this paper to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops or inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing injects noise to ground truth labels with probability in the training in order to avoid overfitting where noisy labels are generated by n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JAP500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques are helpful to improve the generalization of RNNT models individually and they can also be complementary. In particular, it yields good improvements over a strong SWB300 baseline and gives the state-of-art result on SWB300 under RNNT models.

Date

18 Sep 2022

Publication

INTERSPEECH 2022