Publication
INTERSPEECH 2018
Conference paper

Word emphasis prediction for expressive text to speech

View publication

Abstract

Word emphasis prediction is an important part of expressive prosody generation in modern Text-To-Speech (TTS) systems. We present a method for predicting emphasized words for expressive TTS, based on a Deep Neural Network (DNN). We show that the presented method outperforms machine learning methods based on hand-crafted features in terms of objective metrics such as precision and recall. Using a listening test, we further demonstrate that the contribution of the predicted emphasized words to the expressiveness of the synthesized speech is subjectively perceivable.

Date

02 Sep 2018

Publication

INTERSPEECH 2018

Authors

Share