Properly Using Speech Synthesis and Voice Transformation for Audiovisual Content Generation

Monzo, C.; Formiga, Ll.; Adell, J.; Mayor, O.; Bonada, J.; Janer, J.; Iriondo, I.

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Properly Using Speech Synthesis and Voice Transformation for Audiovisual Content Generation

Title	Properly Using Speech Synthesis and Voice Transformation for Audiovisual Content Generation
Publication Type	Conference Paper
Year of Publication	2009
Conference Name	International Broadcasting Conference (IBC2009)
Authors	Monzo, C. , Formiga L. , Adell J. , Mayor O. , Bonada J. , Janer J. , & Iriondo I.
Conference Start Date	10/09/2009
Publisher	IBC
Conference Location	Amsterdam, The Netherlands
Abstract	During the creation process, scriptwriters might want to quickly watch at the result of what they are creating. Text-to-Speech (TTS) systems offer the opportunity to deliver speech in a small amount of time. In addition, information might be dynamically generated by intelligent systems and TTS is crucial to deliver speech. The main drawback of the TTS utilization in audiovisual productions is that commercial systems offer few different voices. However, productions need a different voice for each involved character. Voice Transformation (VT) techniques can be used to overcome this limitation, allowing the user to personalize the voice for each character. In this paper, we will explain the technologies involved in TTS and VT systems and their combination in a nutshell. Finally, we present a study about the most efficient way to combine them: either convert the synthesized speech, or generate a new synthetic voice by converting the original speech database used in the TTS system.