About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2014
Conference paper
Refined inter-segment joining in multi-form speech synthesis
Abstract
In multi-form speech synthesis, speech output is constructed by splicing waveform segments and parametric speech segments which are generated from statistical models. The decision whether to use the waveform or the statistical parametric form is made per segment. This approach faces certain challenges in the context of inter-segment joining. In this work, we present a novel method whereby all noncontiguous joints are represented by statistically generated speech frames without compromising on naturalness. Speech frames surrounding non-contiguous joints between the waveform segments are re-generated from the models and optimized for concatenation. In addition, a novel pitch smoothing algorithm that preserves the original intonation trajectory while maintaining smoothness is applied. We implemented the spectrum and the pitch smoothing algorithms within a multi-form speech synthesis framework that employs a uniform parametric representation for the natural and statistically modeled speech segments. This framework facilitates pitch modification in natural segments. Subjective evaluation results reveal that the proposed smoothing methods significantly improve the perceived speech quality.