Score-informed Syllable Segmentation for A Cappella Singing Voice with Convolutional Neural Networks

Pons, Jordi; Rong Gong; Xavier Serra

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Score-informed Syllable Segmentation for A Cappella Singing Voice with Convolutional Neural Networks

Title	Score-informed Syllable Segmentation for A Cappella Singing Voice with Convolutional Neural Networks
Publication Type	Conference Paper
Year of Publication	2017
Conference Name	The 18th International Society for Music Information Retrieval Conference
Authors	Pons, J. , Gong R. , & Serra X.
Conference Start Date	23/10/2017
Conference Location	Suzhou, China
Abstract	This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term “syllable onset”. Then, we identify which are the challenges that jingju a cappella singing poses. Further, we investigate how to improve the syllable ODFxeC estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture different time-frequency scales for estimating syllable onsets. In addition, we propose using a score-informed Viterbi algorithm –instead of thresholding the onset function–, because the available musical knowledge we have (the score) can be used to inform the Viterbi algorithm in order to overcome the identified challenges. The proposed method outperforms the state-of-the-art in syllable segmentation for jingju a cappella singing. We further provide an analysis of the segmentation errors which points possible research directions.
preprint/postprint document	https://arxiv.org/pdf/1707.03544.pdf

Additional material:

Code: https://github.com/ronggong/jingjuSyllabicSegmentaion/tree/v0.1.0