Sample-based singing voice synthesizer by spectral concatenation

TitleSample-based singing voice synthesizer by spectral concatenation
Publication TypeConference Paper
Year of Publication2003
AuthorsBonada, J., & Loscos A.
AbstractThe singing synthesis system we present generates a performance of an artificial singer out of the musical score and the phonetic transcription of a song using a frame-based frequency domain technique. This performance mimics the real singing of a singer that has been previously recorded, analyzed and stored in a database, in which we store his voice characteristics (phonetics) and his low-level expressivity (attacks, releases, note transitions and vibratos). To synthesize such performance the systems concatenates a set of elemental synthesis units (phonetic articulations and stationeries). These units are obtained by transposing and time-scaling the database samples. The concatenation of these transformed samples is performed by spreading out the spectral shape and phase discontinuities of the boundaries along a set of transition frames that surround the joint frames. The expression of the singing is applied through a Voice Model built up on top of a Spectral Peak Processing (SPP) technique. SPP considers the spectrum as a set of regions. Each region is made up of one spectral peak and its surroundings and can be shifted both in frequency and amplitude. The Voice Model is based on an improved version of the traditional excitation/filter approach. The system will be demonstrated with several performances.
Full Documentfiles/publications/SMAC2003-aloscos.pdf
intranet