Properly Using Speech Synthesis and Voice Transformation for Audiovisual Content Generation

TitleProperly Using Speech Synthesis and Voice Transformation for Audiovisual Content Generation
Publication TypeConference Paper
Year of Publication2009
Conference NameInternational Broadcasting Conference (IBC2009)
AuthorsMonzo, C., Formiga L., Adell J., Mayor O., Bonada J., Janer J., & Iriondo I.
Conference Start Date10/09/2009
PublisherIBC
Conference LocationAmsterdam, The Netherlands
Abstract During the creation process, scriptwriters might want to quickly watch at the result of what they are creating. Text-to-Speech (TTS) systems offer the opportunity to deliver speech in a small amount of time. In addition, information might be dynamically generated by intelligent systems and TTS is crucial to deliver speech.The main drawback of the TTS utilization in audiovisual productions is that commercial systems offer few different voices. However, productions need a different voice for each involved character. Voice Transformation (VT) techniques can be used to overcome this limitation, allowing the user to personalize the voice for each character. In this paper, we will explain the technologies involved in TTS and VT systems and their combination in a nutshell. Finally, we present a study about the most efficient way to combine them: either convert the synthesized speech, or generate a new synthetic voice by converting the original speech database used in the TTS system.
intranet