Singing-driven Interfaces for Sound Synthesizers

TitleSinging-driven Interfaces for Sound Synthesizers
Publication TypePhD Thesis
Year of Publication2008
UniversityUniversitat Pompeu Fabra
AuthorsJaner, J.
AdvisorSerra, X.
Academic DepartmentDepartment of Information and Communication Technologies
CityBarcelona
AbstractTogether with the sound synthesis engine, the user interface, or controller, is a basic component of any digital music synthesizer and the primary focus of this dissertation. Under the title of singing-driven interfaces, we study the design of systems, that based on the singing voice as input, can control the synthesis of musical sounds. From a number of preliminary experiments and studies, we identify the principal issues involved in voice-driven synthesis. We propose one approach for controlling a singing voice synthesizer and another one for controlling the synthesis of other musical instruments. In the former, input and output signals are of the same nature, and control to signal mappings can be direct. In the latter, mappings become more complex, depending on the phonetics of the input voice and the characteristics of the synthesized instrument sound. For this latter case, we present a study on vocal imitation of instruments showing that these voice signals consist of syllables with musical meaning. Also, we suggest linking the characteristics of voice signals to instrumental gestures, describing these signals as vocal gestures. Within the wide scope of the voice-driven synthesis topic, this dissertation studies the relationship between the human voice and the sound of musical instruments by addressing the automatic description of the voice and the mapping strategies for a meaningful control of the synthesized sounds. The contributions of the thesis include several voice analysis methods for using the voice as a control input: a) a phonetic alignment algorithm based on dynamic programming; b) a segmentation algorithm to isolate vocal gestures; c) a formant tracking algorithm; and d) a breathiness characterization algorithm. We also propose a general framework for defining the mappings from vocal gestures to the synthesizer parameters, which are configured according to the instrumental sound being synthesized. As a way to demonstrate the results obtained, two real-time prototypes are implemented. The first prototype controls the synthesis of a singing voice and the second one is a generic controller for other instrumental sounds.
preprint/postprint documentfiles/publications/Tesi_jjaner_2008.pdf
intranet