Tonal Description of Music Audio Signals

Emilia Gómez

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Tonal Description of Music Audio Signals

Title	Tonal Description of Music Audio Signals
Publication Type	PhD Thesis
Year of Publication	2006
University	Universitat Pompeu Fabra
Authors	Gómez, E.
Advisor	Serra, X.
Academic Department	Department of Information and Communication Technologies
Abstract	This dissertation is about tonality. More precisely, it is concerned with the problems that appear when computer programs try to automatically extract tonal descriptors from musical audio signals. This doctoral dissertation proposes and evaluates a computational approach for the automatic description of tonal aspects of music from the analysis of polyphonic audio signals. In this context, we define a tonal description in different abstraction levels, differentiating between low-level signal descriptors (e.g. tuning frequency or pitch class distribution) and high-level textual labels (e.g. chords or keys). These high-level labels require a musical analysis and the use of tonality cognition models. We also establish different temporal scales for description, defining some instantaneous features as being attached to a certain time instant, and other global descriptors as related to a wider segment (e.g. a section of a song). Along this PhD thesis, we have proposed a number of algorithms to directly process digital audio recordings from acoustical instruments, in order to extract tonal descriptors. These algorithms focus on the computation of pitch class distributions descriptors, the estimation of the key of a piece, the visualization of the evolution of its tonal center or the measurement of the similarity between two different musical pieces. Those algorithms have been validated and evaluated in a quantitative way. First, we have evaluated low-level descriptors, such as pitch class distribution features and estimation of the tuning frequency (with respect to 440 Hz), and their independence with respect to timbre, dynamics and other external factors to tonal characteristics. Second, we have evaluated the method for key finding, obtaining an accuracy around 80%. This evaluation has been made for a music collection of 1400 pieces with different characteristics. We have studied the influence of different aspects such as the employed tonal model, the advantage of using a cognition-inspired model vs machine learning methods, the location of the tonality within a musical piece, and the influence of the musical genre on the definition of a tonal center. Third, we have proposed the extracted features as a tonal representation of an audio signal, useful to measure similarity between two pieces and to establish the structure of a musical play. For this, we have evaluated the use of tonal descriptors to identify versions of the same song, obtaining an improvement of 55% over the baseline. From a more general standpoint, this dissertation substantially contributes to the field of computational tonal description It provides a multidisciplinary review of tonal induction systems including signal processing methods and models for tonality induction; It defines a set of requirements for low-level tonal features; It provides a quantitative evaluation of the proposed methods with respect to similar ones for audio key finding. This quantitative evaluation is divided in different stages, analyzing the influence of each one; It supports the idea that some application contexts do not need a accurate symbolic transcription, thus bridging the gap between audio and symbolic-oriented methods without the need of a perfect transcription; It extents current literature dealing with classical music to other musical genres; It shows the usefulness of tonal descriptors for music similarity; It provides an optimized method which is used in a real system for music visualization and retrieval, working with over a million of musical pieces.
Final publication	http://hdl.handle.net/10803/7537