Qinghua Daxue Xuebao/Journal of Tsinghua University

IBM GALE Mandarin transcription system


An automatic transcription of Mandarin broadcast speech system was developed at IBM under the DARPA GALE program. In particular, this system applies a discriminative acoustic model training method and a new topic-adaptive language modeling technique to achieve the best recognition performance using multiple pass decoding. Results are given for three Gale test sets designed to cover both the broadcast news and the broadcast conversation domains. The transcription system achieves satisfactory performance on these datasets. The recognition errors are highly dependent on the speaking style, speech overlap and accent, which helps steer future research.