Semi-Supervised Training and Data Augmentation for Adaptation of Automatic Broadcast News Captioning Systems

Yinghui Huang; Samuel Thomas; Masayuki Suzuki; Zoltan Tuske; Larry Sansone; Michael Picheny

doi:10.1109/ASRU46091.2019.9003943

ASRU 2019

Conference paper

01 Dec 2019

Semi-Supervised Training and Data Augmentation for Adaptation of Automatic Broadcast News Captioning Systems

View publication

Abstract

In this paper we present a comprehensive study on building and adapting deep neural network based speech recognition systems for automatic closed captioning. We develop the proposed systems by first building base automatic speech recognition (ASR) systems that are not specific to any particular show or station. These models are trained on nearly 6000 hours of broadcast news data using conventional hybrid and more recent attention based end-To-end acoustic models. We then employ various adaptation and data augmentation strategies to further improve the trained base models. We use 535 hours of data from two independent BN sources to study how the base models can be customized. We observe up to 32% relative improvement using the proposed techniques on test sets related to, but independent of the adaptation data. At these low word error rates (WERs), we believe the customized BN ASR systems can be used effectively for automatic closed captioning.

Conference paper