Publication
INTERSPEECH - Eurospeech 2005
Conference paper

Exploiting large quantities of spontaneous speech for unsupervised training of acoustic models

Abstract

While large amounts of manually transcribed acoustic training data is available for well-known large vocabulary speech recognition tasks such as, the transcription of broadcast news and switchboard conversations, a significantly less amount is available for several large spoken collections such as the MALACH corpus (in multiple languages), meeting recordings, presentations at conferences, call center conversations, etc. However, these collections offer vast quantities of untranscribed spontaneous speech that can be used to improve recognition accuracies. Several narrow-band and broadband speech collections are currently available and carefully tunec speech recognition systems trained on several hundred hour of manually transcribed data are now able to achieve word error rates between 10% and 40%, depending on the difficulty of the collection. This paper studies the use of automatically recognized transcriptions at several levels of recognition accuracy to train acoustic models anc the performance improvements obtained with such unsupervised training. This paper also proposes a recipe for selection of feature vectors at the utterance, word or fragment level for training acoustic models that provides the maximum gain in recognition accuracy. This paper demonstrates that a reduction in overall word error rate of upto 20 % relative can be obtained with careful selection of acoustic training data.

Date

Publication

INTERSPEECH - Eurospeech 2005

Authors

Share