Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification

TitleNonlinear Audio Recurrence Analysis with Application to Music Genre Classification
Publication TypeMaster Thesis
Year of Publication2010
Authorsde los Santos, C. A.
preprint/postprint documentstatic/media/DeLosSantos-Carlos-Master-Thesis-2010.pdf
AbstractAudio classi fication is a Music Information Retrieval (MIR) area of interest, dedicated to extract key features from music by means of automatic implementations. On this research, nonlinear time series analysis techniques are used for the processing of audio waveforms. The use of nonlinear time series analysis in audio classifi cation tasks is relatively new. These techniques are implemented with the assumption that the temporal evolution of audio signals can be analyzed over a multidimensional space, with the intention of finding additional information that usual audio analysis tools, such as the Fourier Transform, might not bring. In particular, iterative or recurrent patterns in audio signals over a multidimensional space is the desired additional information to find. Some fi rst evidence show these tools can be sensitive to audio signal analysis. In this thesis, two complementary sources for feature extraction based on nonlinear time series analysis are presented. The process consists in performing a recurrence analysis over framed audio signals and representing the output in two di ferent formats: the first, a histogram of the found recurrences at di ferent times in the audio frame. The second, a frequency histogram obtained by transforming and fitting the recurrence time histogram into frequency values with the same resolution as the correspondent frequency spectrum. A specifi c set of spectral features are then extracted from both representations and used for classi fier training and testing. The reliability of new data obtained through these sources is tested by comparing to a common automatic classifi cation methodology, choosing music genre as the target of classifi cation. Among other results described, the combination of features extracted from the Fourier frequency spectrum and features extracted from histograms resulted in a 5.5% increment in the highest common classi fication accuracy, raising it from 66.0% using common methodology to 71.5%. Moreover, the creation of new specific features for these histograms and the maximization of parameters used to perform the nonlinear analysis is suggested as future work on this research.
intranet