Publication
ICASSP 2017
Conference paper

Active learning for low-resource speech recognition: Impact of selection size and language modeling data

View publication

Abstract

Active learning aims to reduce the time and cost of developing speech recognition systems by selecting for transcription highly informative subsets from large pools of audio data. Previous evaluations at OpenKWS and IARPA BABEL have investigated data selection for low-resource languages in very constrained scenarios with 2-hour data selections given a 1-hour seed set. We expand on this to investigate what happens with larger selections and fewer constraints on language modeling data. Our results, on four languages from the final BABEL OP3 period, show that active learning is helpful at larger selections with consistent gains up to 14 hours. We also find that the impact of additional language model data is orthogonal to the impact of the active learning selection criteria.

Date

Publication

ICASSP 2017

Authors

Share