Combining length distribution model with decision tree in prosodic phrase prediction
Abstract
In Text-to-Speech (TTS) systems, prosody phrase prediction is important for the naturalness and intelligibility of synthesized voice. Statistic methods, such as dynamic programming (DP), decision tree (DT), maximum entropy (ME), etc, have been considered for the task. Features based on syntactic and lexical information are widely used. However, the predicted prosody phrases are often observed to have unrealistic length due to the lack of length distribution modeling. This paper proposes a novel algorithm to incorporate the length distribution model in prosody phrase prediction. Rather than directly use phrase length as a feature of DT or ME, the algorithm exploits the correlation between the length and the possibility given by a decision tree. Experiments show that the recalling rate and precise rate are improved 16.37% and 14.05% relatively by using the proposed algorithm.