Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification

de los Santos, C. A.

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification

Title	Nonlinear Audio Recurrence Analysis with Application to Music Genre Classification
Publication Type	Master Thesis
Year of Publication	2010
Authors	de los Santos, C. A.
preprint/postprint document	static/media/DeLosSantos-Carlos-Master-Thesis-2010.pdf
Abstract	Audio classification is a Music Information Retrieval (MIR) area of interest, dedicated to extract key features from music by means of automatic implementations. On this research, nonlinear time series analysis techniques are used for the processing of audio waveforms. The use of nonlinear time series analysis in audio classification tasks is relatively new. These techniques are implemented with the assumption that the temporal evolution of audio signals can be analyzed over a multidimensional space, with the intention of finding additional information that usual audio analysis tools, such as the Fourier Transform, might not bring. In particular, iterative or recurrent patterns in audio signals over a multidimensional space is the desired additional information to find. Some first evidence show these tools can be sensitive to audio signal analysis. In this thesis, two complementary sources for feature extraction based on nonlinear time series analysis are presented. The process consists in performing a recurrence analysis over framed audio signals and representing the output in two diferent formats: the first, a histogram of the found recurrences at diferent times in the audio frame. The second, a frequency histogram obtained by transforming and fitting the recurrence time histogram into frequency values with the same resolution as the correspondent frequency spectrum. A specific set of spectral features are then extracted from both representations and used for classifier training and testing. The reliability of new data obtained through these sources is tested by comparing to a common automatic classification methodology, choosing music genre as the target of classification. Among other results described, the combination of features extracted from the Fourier frequency spectrum and features extracted from histograms resulted in a 5.5% increment in the highest common classification accuracy, raising it from 66.0% using common methodology to 71.5%. Moreover, the creation of new specific features for these histograms and the maximization of parameters used to perform the nonlinear analysis is suggested as future work on this research.