Publication
IEEE Workshop on Speech Synthesis 2002
Conference paper
Statistic prosody structure prediction
Abstract
Hierarchical prosody structure generation is a key component for a speech synthesis system. This paper presents a statistic method that predicts the prosody structure for the Chinese text-to-speech (TTS) system by combining a dynamic program method with the rules. The method is based on a manually annotated corpus extracted from the natural speech (IBM Mandarin TTS Corpus for Female 02). The experimental results show that an accuracy of 91.2% for predicting prosodic structure can be achieved. A state-of-the-art Mandarin TTS system is worked out based on the hierarchical prosody structure. Listening tests show that the prosody structure works pretty well.