SYLLABLE-LEVEL DURATION DETERMINATION.

W.N. Campbell

INTERSPEECH - Eurospeech 1989

Conference paper

27 Sep 1989

SYLLABLE-LEVEL DURATION DETERMINATION.

Abstract

Accurate prediction of duration in a text-to-speech system is essential to natural-sounding intonation. Klatt [I] proposed a set of phoneme-based rules to perform this task, but an adaptation of the rule-set to British English [2] accounted for only 68% of the variance in the duration observed in a 4000-syllable test text. Modification of these rules to incorporate foot-level effects [3,4] improved the prediction slightly to account for 71% of the variance. A similar degree of prediction can be attained, with minimum reference to segment specifics, by modelling duration at the level of the syllable, with sensitivity to stress, position in phrase and foot, and number of segments in onset, peak and coda. This supposes that micro-durational features such as shortening of segments in clusters, and lengthening of vowels to cue voicing, operate at a phonetic Ievel, within the constraints of a syllable frame, and that higher-level features determine factors of lengthening or compression for the framework into which they are to fit. In support of this view, a connectionist implementation, of eight input features, one layer of hidden units and one analog output unit, that accounts for ;m equivalent 70% of the variance in the duration is described.

Conference paper