Conversational Text to Speech


The voice channel is a crucial element in customer-care scenarios, especially over the phone, and text-to-speech (TTS) systems play a fundamental role in establishing and maintaining a positive customer experience.

We are developing a low latency expressive text-to-speech intended for use in conversational voice agents for customer-care. By designing and recording a speech corpus with conversational content, expressive speaking styles, and interjections, and by employing innovative deep learning and data augmentation techniques, our conversational TTS system can produce human sounding expressive spoken machine responses in a variety of voices.

Furthermore, we have enabled the technology to synthesize expressive speech while text is being generated by a large language model (LLM), with only a minimal latency between text and speech generation. This makes it compatible with generative conversational AI systems.