Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news
Abstract
We describe recent extensions and improvements to IBM's system for automatic transcription of broadcast news. The speech recognizer uses a total of 160 hours of acoustic training data, 80 hours more than for the system described in [6]. In addition to improvements obtained in 1997 we made a number of changes and algorithmic enhancements. Among these were changing the acoustic vocabulary, reducing the number of phonemes, insertion of short pauses, mixture models consisting of non-Gaussian components, pronunciation networks, factor analysis (FACILT) and Bayesian Information Criteria (BIC) applied to choosing the number of components in a Gaussian mixture model. The models were combined in a single system using NIST's script voting machine known as rover.