IBM Arabic-to-English Translation for IWSLT 2006

Young-Suk Lee

IWSLT 2006

Conference paper

27 Nov 2006

IBM Arabic-to-English Translation for IWSLT 2006

Abstract

We present techniques for improving domain-specific translation quality with a relatively high OOV ratio on test data sets. The key idea is to maximize the vocabulary coverage without degrading the translation quality. We maximize vocabulary coverage by segmenting a word into a sequence of morphemes, prefix*-stem-suffix* and by adding a large amount of out-of-domain training corpora. To preserve the domain-specific meaning of vocabularies occurring in both domain-specific and out-of-domain training corpora, we assign a higher weight to the domain-specific corpus than to the out-of-domain corpora. IBM Arabic-to-English spoken language translation systems using these techniques have demonstrated the best performances in the Open Data Track of the IWSLT2006 Evaluation Campaign.

Conference paper