Word emphasis prediction for expressive text to speech

Yosi Mass; Slava Shechtman; Moran Mordechay; Ron Hoory; Oren Sar Shalom; Guy Lev; David Konopnicki

doi:10.21437/Interspeech.2018-1159

INTERSPEECH 2018

Conference paper

02 Sep 2018

Word emphasis prediction for expressive text to speech

View publication

Abstract

Word emphasis prediction is an important part of expressive prosody generation in modern Text-To-Speech (TTS) systems. We present a method for predicting emphasized words for expressive TTS, based on a Deep Neural Network (DNN). We show that the presented method outperforms machine learning methods based on hand-crafted features in terms of objective metrics such as precision and recall. Using a listening test, we further demonstrate that the contribution of the predicted emphasized words to the expressiveness of the synthesized speech is subjectively perceivable.

Conference paper