Publication
ICASSP 2022
Conference paper

Adaptive Discounting of Implicit Language Models in RNN-Transducers

View publication

Abstract

RNN-Transducer (RNN-T) models have become synonymous with streaming end-to-end ASR systems. While they perform competitively on a number of evaluation categories, rare words pose a serious challenge to RNN-T models. One main reason for the degradation in performance on rare words is that the language model (LM) internal to RNN-Ts can become overconfident and lead to hallucinated predictions that are acoustically inconsistent with the underlying speech. To address this issue, we propose a lightweight adaptive LM dis-counting technique ADAPTLMD, that can be used with any RNN-T architecture without requiring any external resources or additional parameters. ADAPTLMD uses a two-pronged approach: 1. Randomly mask the LM output to encourage the RNN-T not to be overly reliant on LM outputs. 2. Dy-namically choose when to discount the LM based on the rarity of recently predicted tokens and divergence between LM and acoustic model scores. Comparing ADAPTLMD to a competitive RNN-T baseline, we obtain up to4%and14%relativereductions in overall WER and rare word PER, respectively, on a conversational, code-mixed Hindi-English ASR task

Date

21 May 2022

Publication

ICASSP 2022