| Abstract | This thesis investigates the use of high level descriptors (like genre, mood, instrumentation,
singer's gender, etc.) in audio mosaicing, a form of data driven
concatenative sound synthesis (CSS). The document begins by discussing the
advances made in the eld of music content description over the last 10 years, explaining
the meaning of high level music content description and highlighting the
relevance of automatic music content description in general, to the eld of audio
mosaicing. It proceeds, tracing the origins of mosaicing from its beginnings as
a time consuming manual process, through to modern eorts to automate mosaicing
and enhance the productivity of artists seeking to create mosaics. The
essential components of a mosaicing system are described. Existing mosaicing
systems are dissected and categorised into a taxonomy based on their potential
application area. The time resolution of high level descriptors is investigated
and a new hierarchical framework for incorporating high level descriptors into
mosaicing applications is introduced and evaluated. This framework is written
in Python and utilises pure data as both user interface and audio engine.
Descriptors, stemming from Music Information Retrieval (MIR) research are
calculated using an in-house analysis extraction tool. In-house audio-matching
software is used as the similarity search engine. Many other libraries have also
been integrated to aid the research, in particular Aubio for note detection, and
Rubberband, for time stretching. The high level descriptors included in this
project are; mood (happy, sad, relaxed or happy), gender (male or female), key,
scale (major or minor), instrumental, vocal. A mini application for augmenting
audio loops with mosaics is presented. This is used to show how the framework
can be extended to cater for a given mosaicing paradigm. The musical
applications of mosaics in the traditional song-based composition are also explored.
Finally, conclusions are drawn and directions for future work postulated.
|