A comparison of statistical methods and features for the prediction of prosodic structures

Qin Shi; Volker Fischer

ICSLP 2004

Conference paper

04 Oct 2004

A comparison of statistical methods and features for the prediction of prosodic structures

Abstract

Prosody structure prediction plays an important role in text-to-speech (TTS) conversion systems, where it is a prior step to parametric prosody prediction. Dynamic programming (DP) and decision tree based methods (DT) are widely used for this purpose, but both have well-known limitations. In this paper, we present a combination of both methods, explore the relationship between corpus size and accuracy for three different prediction tasks, and report on the use various lexical features. It is shown that a combination of dynamic programming and decision trees provides the best choice for prosodic word boundary prediction, while decision trees alone give the best results for the prediction of prosodic phrase boundaries. Being originally developed for the Chinese language, we finally demonstrate the transfer of the methods to two different languages, namely Korean and German, where similar results are achieved.

Conference paper