Retrieving Ambiguous Sounds Using Perceptual Timbral Attributes in Audio Production Environments

TitleRetrieving Ambiguous Sounds Using Perceptual Timbral Attributes in Audio Production Environments
Publication TypeMaster Thesis
Year of Publication2017
AuthorsCorreya, A. A.
AbstractFor over an decade, one of the well identified problem within audio production environments is the effective retrieval and management of sound libraries. Most of the self-recorded and commercially produced sound libraries are usually well structured in terms of meta-data and textual descriptions and thus allowing traditional text-based retrieval approaches to obtain satisfiable results. However, traditional information retrieval techniques pose limitations in retrieving ambiguous sound collections (ie. sounds with no identifiable origin, foley sounds, synthesized sound effects, abstract sounds) due to the difficulties in textual descriptions and the complex psychoacoustic nature of the sound. Early psychoacoustical studies propose perceptual acoustical qualities as an effective way of describing these category of sounds[1]. In Music Information Retrieval (MIR) studies, this problem were mostly studied and explored in context of content-based audio retrieval. However, we observed that most of the commercial available systems in the market neither integrated advanced content-based sound descriptions nor the visualization and interface design approaches evolved in the last years. Our research was mainly aimed to investigate two things; 1.development of audio retrieval system incorporating high level timbral features as search parameters. 2.Investigate user-centered approach in integrating these features into audio production pipelines using expert-user studies. In this project, We present an prototype which is similar to traditional sound browsers (list-based browsing) with an added functionality of filtering and ranking sounds by perceptual timbral features such as brightness, depth, roughness and hardness. Our main focus was on the retrieval process by timbral features. Inspiring from the recent focus on user-centered systems ([2], [3]) in the MIR community, in-depth interviews and qualitative evaluation of the system were conducted with expert-user in order to identify the underlying problems. Our studies observed the potential applications of high-level perceptual timbral features in audio production pipelines using a probe system and expert-user studies. We also outlined future guidelines and possible improvements to the system from the outcomes of this research.
KeywordsAudio Production, AudioCommons, freesound, High-level Perceptual Timbral Features, Sound Browsers; Content-Based Audio Retrieval; Music Information Retrieval, Sound Databases, User study
Final publicationhttps://doi.org/10.5281/zenodo.1098587
intranet