Improving Audio Retrieval through Content and Metadata Categorization

TitleImproving Audio Retrieval through Content and Metadata Categorization
Publication TypeMaster Thesis
Year of Publication2015
AuthorsParekh, S.
AbstractAudio content sharing on online platforms has become increasingly popular. This necessitates development of techniques to better organize and retrieve this data. In this thesis we look to improve audio retrieval through content and metadata categorization in the context of Freesound. For content, we focus on organiza- tion through morphological description. In particular, we propose a taxonomy and thresholding-based classi fication approach for loudness pro les. The approach can be generalized to structure information about the temporal evolution of other sound attributes. To this end, we also discuss our preliminary ndings from extension of this methodology to pitch pro les. On the other hand, meta- data systematization has been approached through a topic model known as the Latent Dirichlet Allocation (LDA). Herein automatic clustering of tag information is performed to achieve a higher level representation of each audio le in terms of 'topics'. We evaluate our approach for both the tasks through several experiments con- ducted over two datasets. This thesis fi nds immediate application in online au- dio sharing platforms and opens up several interesting future research avenues. Speci fically, evaluation indicates that our methods can be immediately applied to improve Freesound's similarity and context-based search. Moreover, we believe our work on content categorization makes it possible to include an advanced content-based search facility in Freesound.
intranet