About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICSLP 2004
Conference paper
A comparison of statistical methods and features for the prediction of prosodic structures
Abstract
Prosody structure prediction plays an important role in text-to-speech (TTS) conversion systems, where it is a prior step to parametric prosody prediction. Dynamic programming (DP) and decision tree based methods (DT) are widely used for this purpose, but both have well-known limitations. In this paper, we present a combination of both methods, explore the relationship between corpus size and accuracy for three different prediction tasks, and report on the use various lexical features. It is shown that a combination of dynamic programming and decision trees provides the best choice for prosodic word boundary prediction, while decision trees alone give the best results for the prediction of prosodic phrase boundaries. Being originally developed for the Chinese language, we finally demonstrate the transfer of the methods to two different languages, namely Korean and German, where similar results are achieved.