Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System

Slava Shechtman

SSW 2007

Conference paper

22 Aug 2007

Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System

Abstract

In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch extracted from the concatenated units, by a technique, named microprosody preservation, which is also described. The latter is intended for reducing pitch modification ratio and improving sound naturalness for large-scale concatenative TTS systems. The proposed model was successfully applied on IBM’s trainable concatenative TTS system improving the subjective intonation quality.

Conference paper