Pitch Estimation of the Predominant Vocal Melody from Heterophonic Music Audio Recordings

TitlePitch Estimation of the Predominant Vocal Melody from Heterophonic Music Audio Recordings
Publication TypeMaster Thesis
Year of Publication2014
AuthorsIshwar, V.
AbstractMusic being an industry with a vast digital presence, today, we have access to a large number of audio music recordings online as well as stored locally on computers, cell phones, ipads, to name a few devices. Many non-western music cultures have also made a large digital presence over the last two decades. This opens up many windows for applications with state of the art archiving, automatic tagging, lyrics to audio alignment and automatic indexing of music using a vast number of cues from the user inputs. It also opens up many avenues for meaningful musical analysis of various music traditions computationally. Melody being one of the most basic entities, predominant pitch is one of the fundamental representations used in all these tasks. In this work we deal with the pitch estimation of the predominant vocal melody from heterophonic music audio recordings. We provide a novel approach for pitch estimation using a combination of the present state of the art and timbral characteristics of the various melodic sources in the audio music recording. We perform a detailed review of the state of the art pertaining to computational analysis of music and the usage of predominant melody pitch as a basic representation in a number of tasks. We also review the state of the art with respect to predominant melody estimation and singing voice detection them being highly relevant for this task since we aim at characterizing the singing voice. The proposed approach is a classification based approach for pitch estimation of vocal melodies. Indian art music is subjected to the approach and for this reason, the musical and cultural aspects of the music have been considered in the approach. We first extract candidate pitch contours using a state of the art predominant melody extraction algorithm. Post this timbral features are extracted corresponding to the source of the pitch contour from the audio signal. These features, instead of being derived from the spectrum of the audio are derived from a representation of the extracted harmonics of the candidate pitch contours. A classification of the candidate pitch contours into vocal and non-vocal classes is then performed. The music specific information is incorporated in a contour selection methodology which uses the tonic pitch which is a fundamental aspect of Indian art music, the test case for this approach. A detailed explanation of the entire approach with implementation details are provided in this thesis. The approach is evaluated on a database of Karṇāṭik music for which ground truth is manually curated. A novel evaluation methodology based on adaptive thresholding to incorporate the properties of the music at hand is proposed. The evaluation results surpass the state of the art for predominant melody. This reinforces the hypothesis that the combination of salience based methods and timbral properties of the singing voice aids estimation of pitch of singing voice. A detailed analysis of the results obtained with plausible reasons is performed. The thesis is concluded with the summary of the work, the main conclusions and the contributions made in the course of this work. ii