New IEEE/ACM TASP paper on multi-feature beat tracking

Our article on multi-feature beat tracking for the IEEE/ACM Transactions on Audio, Speech and Language Processing is now available online! This is a work carried leaded by Jose R. Zapata for his PhD thesis in collaboration with Mathew Davies from the SMC group in Porto, based on the idea of combining different experts, represented by periodicity from different onset detection functions, for beat estimation. This is a simple and clever idea, already used to combine different beat tracking algorithms and evaluate the difficulty of the task, that has been integrated in a different method.

Zapata, J. R., Davies M. E. P., & Gómez E. (2014). Multi-feature beat tracking. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 22(4), 816 – 825. RTF, Tagged, XML, BibTex, Google Scholar

Abstract: A recent trend in the field of beat tracking for musical audio signals has been to explore techniques for measuring the level of agreement and disagreement between a committee of beat tracking algorithms. By using beat tracking evaluation methods to compare all pairwise combinations of beat tracker outputs, it has been shown that selecting the beat tracker which most agrees with the remainder of the committee, on a song-by-song basis, leads to improved performance which surpasses the accuracy of any individual beat tracker used on its own. In this paper we extend this idea towards presenting a single, standalone beat tracking solution which can exploit the benefit of mutual agreement without the need to run multiple separate beat tracking algorithms. In contrast to existing work, we re-cast the problem as one of selecting between the beat outputs resulting from a single beat tracking model with multiple, diverse input features. Through extended evaluation on a large annotated database, we show that our multi-feature beat tracker can outperform the state of the art, and thereby demonstrate that there is sufficient diversity in input features for beat tracking, without the need for multiple tracking models.

Melody Extraction Review published in the IEEE Signal Processing Magazine

Our review article on melody extraction algorithms for the IEEE Signal Processing Magazine is finally available online! The printed edition will be coming out in March 2014. This article provides an overview of approaches, challenges and applications for melody extraction from polyphonic music signals.

J. Salamon, E. Gómez, D. P. W. Ellis and G. Richard, “Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges“, IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.

Abstract: Melody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. Over the past decade melody extraction has emerged as an active research topic, comprising a large variety of proposed algorithms spanning a wide range of techniques. This article provides an overview of these techniques, the applications for which melody extraction is useful, and the challenges that remain. We start with a discussion of ‘melody’ from both musical and signal processing perspectives, and provide a case study which interprets the output of a melody extraction algorithm for specific excerpts. We then provide a comprehensive comparative analysis of melody extraction algorithms based on the results of an international evaluation campaign. We discuss issues of algorithm design, evaluation and applications which build upon melody extraction. Finally, we discuss some of the remaining challenges in melody extraction research in terms of algorithmic performance, development, and evaluation methodology.

Participation to AES 53rd Conference on Semantic Audio

Frederic Font, Jordi Janer and Xavier Serra participate to the 53rd Conference on Semantic Audio of the Audio Engineering Society that takes place in London from January 26th  to the 29th, 2014.

Xavier has been invited to give a talk on CompMusic, entitled: "Creating Research Corpora for the Computational Study of Music: the case of the CompMusic Project", Frederic is giving a talk on his recent PhD research: "Audio clip classification using social tags and the effect of tag expansion", and Jordi presents a paper done with David S. Blancas on "Sound Retrieval from Voice Imitation Queries in Collaborative Databases".

TechTransfer position at the MTG through TECNIOspring

The MTG is part of a catalan initiative named TECNIO and through it there is a call for incorporating an experienced researcher interested in carrying out TechTransfer activities. 

TECNIOspring is a fellowship programme that provides financial support to individual mobility proposals presented by experienced researchers in liaison with a TECNIO centre (like our research group). Host institutions will offer fellows a stimulating and multidisciplinary scientific environment in which to develop their applied research projects with focus on technology transfer.

Fellows will be offered 2-year employment contracts in order to develop their applied research projects. Please note that this call presents a strong focus on TechTransfer, so candidates are required to have experience in applied research and / or technology transfer activities (at least 1 year). 

There are two types of fellowships:

  • Incoming - mobility for experienced researchers of any nationality willing to join our centre for 2 years. Candidates must hold a PhD and four additional years of full-time equivalent research experience; or eight years of full-time equivalent research experience.
  • Outgoing + return - Mobility outside Spain for experienced researchers of any nationality that reside in Catalonia willing to join a research or technology centre or R&D department of a private company for one year. This scheme will include areturn phase of one more year to the MTG. Candidates must hold a PhD.

Further details about the funding involved per fellowship, eligibility criteria and evaluation process are available in the programme leaflet. Those of you interesting in applying, please mtg [at] upf [dot] edu (subject: tecniospring) (send us )a briefing about the project you propose together with your CV.


Seminar by Julián Urbano on Evaluation in MIR
16 Jan 2014

Julián Urbano, postdoc at the MTG, will give a seminar on "Evaluation in (Music) Information Retrieval through the Audio Music Similarity task" on January 16th at 3:30pm in room 52.321.

Abstract: Test-collection based evaluation in (Music) Information Retrieval has been used for half a century now as the means to evaluate and compare retrieval techniques and advance the state of the art. However, this paradigm makes certain assumptions that remain a research problem and that may invalidate our experimental results. In this talk I will approach this paradigm as an estimator of certain probability distributions that describe the final user experience. These distributions are estimated with a test collection, computing system-related distributions assumed to reliably correlate with the target user-related distributions. Using the Audio Music Similarity task as an example, I will talk about issues with our current evaluation methods, the degree to which they are problematic, how to analyze them and improve the situation. In terms of validity, we will see how the measured system distributions correspond to the target user distributions, and how this correspondence affects the conclusions we draw from an experiment. In terms of reliability, we will discuss optimal characteristics of test collections and statistical procedures. In terms of efficiency, we discuss models and methods to greatly reduce the annotation cost of an evaluation experiment.

Maika the new vocaloid singer by Voctro Labs is now available!

Voctro Labs christmas' gift has arrived! MAIKA, the new female Vocaloid 3 Voice Library, is a virtual singer that allows you to create vocal parts on your computer without the need of recording a real singer. By simply entering melody, lyrics and expression parameters you'll be able to create lead vocals, vocal accompaniment, demo vocals, vocal effects; the possibilities are endless. MAIKA is designed to sing in Spanish, but contains a wide range of phonemes that will also cover parts of other languages like Portuguese, Italian, Catalan, English and Japanese.

MAIKA has a powerful feminine voice. In the lower registers she has a softer, more airy voice, while in the higher registers she has a more intense timbre. She has an extraordinarily broad pitch range, which switches from a chest voice to a head voice in the highest registers. This makes her voice suited for a large range of musical genres and styles.

You can directly download the edition or if you prefer you can also order the boxed limited edition from Voctro Labs' website.

Application open for Master and PhD programs of the UPF
19 Nov 2013 - 27 Jun 2014

From November 19th 2013 to June 27th 2014, the application is open for all the master's and doctoral programmes of the UPF for the 2014-2015 academic year.

For the Master in Sound and Music Computing you can find the information in here. To do a PhD at the MTG you have to enrol in the PhD program in Information and Communication Technologies and you can find the information in here


Introduction to music therapy and use of ICT in music therapy diagnostic
27 Nov 2013

Wednesday Nov 17th, 2013 in room 52.S29 (in front of the canteen) at 16h

Introduction to music therapy and use of ICT in music therapy diagnostic

ABSTRACT: During the last 50 years European and European influenced music therapy did big steps towards a fixed part in health and social services and sciences. Selected examples from clinical practice will give a brief overview about the scope of the very different application of music therapy approaches in very different fields of clinical practice. Moreover, one field of interdisciplinary research and development combining ICT and music therapy is the field of microanalyses in music therapy. So far qualitative research and standardized observations were used in the last 10 years for research of therapeutic processes in music therapy. All these methods are very time consuming. However, these methods are very important for diagnostic and assessment of music therapy. First developments, i.e. Music Therapy Toolbox on the basis of MIR, are very promising for future use as diagnostic tools in music therapy. The state of the art of this field will be presented and discussed.

BIO: Thomas Wosch is professor of music therapy at the university of applied sciences of Wuerzburg and Schweinfurt in Germany. He is director of Master in Music Therapy for clients with special needs and for clients with dementia. He is head of last year specialisation in music therapy in BA Social Work. He was 10 years music therapist in acute adult psychiatry with focus of treatment of schizophrenia, depression, anxiety disorders and borderline disorder. His special field of research are microanalyses in music therapy (measurement of minimal changes in music therapy processes; see also: Wosch & Wigram (2007) (eds.): Microanalysis in Music Therapy. London & Philadelphia: JKP.). He has research cooperation and international teaching all over Europe, in US, down under and South America. He is Co-editor of and of "Musik und Gesundsein" (music and health).

Tan Özaslan defends his PhD Thesis on November 29th
29 Nov 2013

Tan Özaslan defends his PhD thesis entitled "Computational Analysis of Expressivity in Classical Guitar Performances" on November 29th 2013 at 11:00h in room 52.429 of the Communication Campus of the UPF.

Thesis directors: Josep Lluís Arcos and Xavier Serra
Jury members: Ramon Lopez de Mantaras (IIIA-CSIC), Isabel Barbancho (Universidad de Málaga), Rafael Ramirez (UPF)

Abstract: The study of musical expressivity is an active field in sound and music computing. The research interest comes from different motivations: to understand or model musical expressivity; to identify the expressive resources that characterize an instrument, musical genre, or performer; or to build synthesis systems able to play expressively. To tackle this broad problem, researchers focus on specific instruments and/or musical styles. Hence, in this thesis we focused on the analysis of the expressivity in classical guitar and our aim is to model the use of expressive resources of the instrument. The foundations of all the methods used in this dissertation are based on techniques from the fields of information retrieval, machine learning, and signal processing. We combine several state of the art analysis algorithms in order to deal with modeling the use of the expressive resources. Classical guitar is an instrument characterized by the diversity of its timbral possibilities. Professional guitarists are able to convey a lot of nuances when playing a musical piece. This specific characteristic of classical guitar makes the expressive analysis laborious. In particular we divided our analysis into two main sections. First section provides a tool able to automatically identify expressive resources in the context of real recordings. We build a model in order analyze and automatically extract the tree most used expressive articulations, legato, glissando and vibrato. Second section provides an comprehensive analysis of timing deviations in classical guitar. Timing variations are perhaps the most important ones: they are fundamental for expressive performance and a key ingredient for conferring a human-like quality to machine-based music renditions. However, the nature of such variations is still an open research question, with diverse theories that indicate a multi-dimensional phenomenon. Our system exploits feature extraction and machine learning techniques. Classification accuracies show that timing deviations are accurate predictors of the corresponding piece. To sum up, this dissertation contributes to the field of expressive analysis by providing, an automatic expressive articulation model and a musical piece prediction system by using timing deviations. Most importantly it analyzes the behavior of proposed models by using commercial recordings.

Seminar by Jean-Julien Aucouturier on spectro-temporal receptive fields for MIR
22 Nov 2013
Jean-Julien Aucouturier, from CNRS/IRCAM, gives a seminar on "Spectro-temporal receptive fields (STRFs): a biologically-plausible alternative to MFCCs?" on Friday November 22nd at 15:30h in room 55.410.

Abstract: We describe some recent experiments to adapt a recent computational model of the mammalian auditory cortex to the tasks of Music Information Retrieval. The model, called Spectro-temporal Receptive Fields (STRFs), simulates the responses of auditory cortical neurons as a filterbank of Gabor function tuned on frequencies, but also rates (temporal modulations in Hz) and scales (frequency modulations in cycle/octave). Off the shelf, it provides a 30,000 dimensional feature space; when these dimensions are integrated, we can derive novel signal representations/features that  (1) perform equivalently or better than e.g. Mel-Frequency Cepstrum Coefficients for a task of audio similarity, (2) are somewhat amusing (e.g. dynamic frequency wrapping instead of DTW), and (3) more plausible that the usual MIR features from a biological point of view. 

