On the importance of event detection for ASR

David Haws; Dimitrios Dimitriadis; George Saon; Samuel Thomas; Michael Picheny

doi:10.1109/ICASSP.2016.7472770

ICASSP 2016

Conference paper

18 May 2016

On the importance of event detection for ASR

View publication

Abstract

The performance of modern large vocabulary continuous speech recognition (LVCSR) systems is heavily affected by segment boundaries, proper speaker identification of the segments, as well as removal of spurious data. We propose to use Long Short Term Memory (LSTM) recurrent neural networks to partition audio into speech segments as well as track speaker turns. Additionally, we train an LSTM to also identify music segments. We show that the accurate detection of events, along with removal of silence and music, using our LSTM yields a 9-10% relative improvement in ASR performance. Secondary processing by speaker clustering provides an additional boost in accuracy. Event detection accuracy of the LSTM approach is also described.

Conference paper