Musical Mosaicing with High Level Descriptors

John O'Connell

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Musical Mosaicing with High Level Descriptors

Title	Musical Mosaicing with High Level Descriptors
Publication Type	Master Thesis
Year of Publication	2011
Authors	O'Connell, J.
preprint/postprint document	http://mtg.upf.edu/system/files/publications/OConnell-John-Master-thesis-2011_0.pdf
Abstract	This thesis investigates the use of high level descriptors (like genre, mood, instrumentation, singer's gender, etc.) in audio mosaicing, a form of data driven concatenative sound synthesis (CSS). The document begins by discussing the advances made in the eld of music content description over the last 10 years, explaining the meaning of high level music content description and highlighting the relevance of automatic music content description in general, to the eld of audio mosaicing. It proceeds, tracing the origins of mosaicing from its beginnings as a time consuming manual process, through to modern eorts to automate mosaicing and enhance the productivity of artists seeking to create mosaics. The essential components of a mosaicing system are described. Existing mosaicing systems are dissected and categorised into a taxonomy based on their potential application area. The time resolution of high level descriptors is investigated and a new hierarchical framework for incorporating high level descriptors into mosaicing applications is introduced and evaluated. This framework is written in Python and utilises pure data as both user interface and audio engine. Descriptors, stemming from Music Information Retrieval (MIR) research are calculated using an in-house analysis extraction tool. In-house audio-matching software is used as the similarity search engine. Many other libraries have also been integrated to aid the research, in particular Aubio for note detection, and Rubberband, for time stretching. The high level descriptors included in this project are; mood (happy, sad, relaxed or happy), gender (male or female), key, scale (major or minor), instrumental, vocal. A mini application for augmenting audio loops with mosaics is presented. This is used to show how the framework can be extended to cater for a given mosaicing paradigm. The musical applications of mosaics in the traditional song-based composition are also explored. Finally, conclusions are drawn and directions for future work postulated.

OConnell-John-Master-thesis-2011.pdf