Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings

Samuel Thomas; Kartik Audhkhasi; Brian Kingsbury

doi:10.21437/Interspeech.2020-2593

INTERSPEECH 2020

Conference paper

25 Oct 2020

Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings

View publication

Abstract

Multilingual acoustic models are often used to build automatic speech recognition (ASR) systems for low-resource languages. We propose a novel data augmentation technique to improve the performance of an end-to-end (E2E) multilingual acoustic model by transliterating data into the various languages that are part of the multilingual training set. Along with two metrics for data selection, this technique can also improve recognition performance of the model on unsupervised and cross-lingual data. On a set of four low-resource languages, we show that word error rates (WER) can be reduced by up to 12% and 5% relative compared to monolingual and multilingual baselines respectively. We also demonstrate how a multilingual network constructed within this framework can be extended to a new training language. With the proposed methods, the new model has WER reductions of up to 24% and 13% respectively compared to monolingual and multilingual baselines.

Conference paper