Singing Phoneme Class Detection In Polyphonic Music Recordings

TitleSinging Phoneme Class Detection In Polyphonic Music Recordings
Publication TypeMaster Thesis
Year of Publication2008
AuthorsVagia, O.
preprint/postprint documentfiles/publications/Ourania-Vaggia-Master-Thesis.pdf

Automatic singing detection and singing phoneme recognition are two MIR research topics that have gained a lot of attention the last years. The fi rst approaches borrowed successful techniques widely used in Automatic Speech Recognition (ASR) as speech and singing share similar acoustical features since they are produced by the same apparatus. Moving from monophonic to polyphonic audio signals the problem becomes more complex as the background instrumental accompaniment is regarded as a noise source that has to be attenuated.

This thesis presents research into the problem of singing phoneme detection in polyphonic audio, in which the lyrics are in English. Specifically, we are interested in building statistical classifi cation models that are able to automatically distinguish sung consonants and vowels from pure instrumental music in polyphonic music recordings.

The approach begins with a database creation to be used for training, testing and evaluating the models. Several sets of extracted low-level features are used in the classification process. Di fferent classification functions are compared like SVM, MLP and logistic as well as diff erent classification schemes (3-class classifiers, binary classi fiers in series and in parallel). The best classification model found reaches an overall accuracy of 78% in distinguishing between the 3 diff erent classes.