Towards Automatic Music Structural Analysis Identifying Characteristic Within-Song Excerpts in Popular Music

TitleTowards Automatic Music Structural Analysis Identifying Characteristic Within-Song Excerpts in Popular Music
Publication TypeMaster Thesis
Year of Publication2005
AuthorsOng, B.
preprint/postprint documentfiles/publications/330410-DEA-BeeSuan2005.pdf
AbstractAutomatic audio content analysis is a general research area in which algorithms are developed to allow computer systems to understand the content of digital audio signals for further exploitations. The main focus therein is on the practical applications for audio files management, like automatic labeling, efficient browsing, or the retrieval of relevant files with little effort from a big database. Automatic music structural analysis is a specific subset of audio content analysis in which the domain of audio content is restricted to the semantically meaningful descriptions of audio in a musical context. The main task of automatic music structural analysis is to discover the structure of music by analyzing audio signals in order to facilitate a better handling of the current explosively expanding amounts of audio data available in digital collections.

In this research work, we focus our investigation on two areas that are part of audiobased music structural analysis. First, we propose a unique framework and method for temporal audio segmentation at the semantic level. The system aims to detect the structural changes in music to provide a way to separate the different “sections” of a piece according to its structural titles (i.e. intro, verse, chorus, bridge, etc). We present a two-phase music segmentation system together with a combined set of lowlevel audio descriptors to be extracted form the music audio signals. Contrary to existing approaches, we consider the applicability of image processing methods in audio content analysis. A database of 54 audio files (The Beatles’ song) is used for the evaluation of the proposed approach on a mainstream popular music collection. The experiment results demonstrate that our proposed algorithm has achieved 71% of accuracy and 79% of reliability in a practical application for identifying structural boundaries in music audio signals.

Secondly, we present our proposed framework and approach for the identification of representative excerpts from music audio signals. The system aims to extract a short abstract that serves as a ‘hook’ or thumbnail of the music and generates a retrieval cue from the original audio files. Instead of simply pursuing the present literature that mainly accentuates the repetitiveness of audio excerpts in the identification task, we also investigate the potential of audio descriptors in capturing specific characteristics of the representative excerpts. A database of 28 music tracks that comprises popular songs from various artists is used to evaluate the performance of our identification system. By integrating musical knowledge in selecting appropriate audio descriptors for the identification task, preliminary quantitative evaluation results show that the overall performance of the content-based approaches has achieved a higher performance rate compared to repetition-based approaches.