Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information

Rong Gong; Philippe Cuvillier; Nicolas Obin; Arshia Cont

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information

Title	Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information
Publication Type	Conference Paper
Year of Publication	2015
Conference Name	Interspeech 2015
Authors	Gong, R. , Cuvillier P. , Obin N. , & Cont A.
Conference Start Date	06/09/2015
Conference Location	Dresden, Germany
Abstract	Singing voice is specific in music: a vocal performance con- veys both music (melody/pitch) and lyrics (text/phoneme) con- tent. This paper aims at exploiting the advantages of melody and lyric information for real-time audio-to-score alignment of singing voice. First, lyrics are added as a separate observa- tion stream into a template-based hidden semi-Markov model (HSMM), whose observation model is based on the construc- tion of vowel templates. Second, early and late fusion of melody and lyric information are processed during real-time audio-to-score alignment. An experiment conducted with two professional singers (male/female) shows that the performance of a lyrics-based system is comparable to that of melody-based score following systems. Furthermore, late fusion of melody and lyric information substantially improves the alignment per- formance. Finally, maximum a posteriori adaptation (MAP) of the vowel templates from one singer to the other suggests that lyric information can be efficiently used for any singer.

Interspeech_RG_2015.pdf