Singing Voice Nasality Detection in Polyphonic Audio

TitleSinging Voice Nasality Detection in Polyphonic Audio
Publication TypeMaster Thesis
Year of Publication2009
AuthorsRamesh, A.
preprint/postprint documentstatic/media/Ramesh-Anandhi-Master-Thesis-2009.pdf
AbstractThis thesis proposes a method for characterising the singing voice in polyphonic commercial recordings. The specific feature that we characterise is the nasality of the singer's voice. The main contribution of the thesis is in defining a strong set of nasality descriptors that can be applied within the constraints of polyphony. The segment of the recording containing the voice as the predominant source is manually selected. The source is a stereo recording, where the voice is a mono track assumed to be panned to the centre. This two channel segment is converted to mono using direct averaging of the left and right channels. However, the possibility of accompaniment reduction or elimination by extracting the perceptual center of the mix, which in popular recordings is usually the voice, is also explored. Following this, the harmonic frames in the segment are identified for descriptor extraction, since the nasal quality is normally experienced in the voiced vowels and the voice is a strongly harmonic source. The choice of descriptors derives from the prior research into nasality in speech, as well as some spectral features of the nasal consonants. Once the descriptors are available for the entire segment, they are input to a one-class classifier for obtaining a model of the nasal voice. The evaluation of the model is performed for different sets of descriptors as well as for the effectiveness of the center-track extraction. The results are also compared against a standard set of descriptors used for the voice timbre characterization, the MFCCs and some spectral descriptors. The performance is comparable, and the chosen descriptor set outperforms the generic feature vector in some cases. Also the choice of carefully selected descriptors achieves a reduction in the length of the feature vector.
intranet