News and Events

PhD fellowship on Audio-Visual Approaches for Music Information Retrieval

The Music Technology Group (MTG) and the Image Processing Group (GPI) of the Department of Information and Communication Technologies, Universitat Pompeu Fabra in Barcelona are opening a joint PhD fellowship in the topic of “Audio-Visual Approaches for Music Content Description” to start in the Fall of 2015.

Music is a highly multimodal concept, where various types of heterogeneous information are associated to a music piece (audio, musician’s gestures and facial expression, lyrics, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to multimodal music analysis studies (Essid and Richard, 2012).

Research on the complementarity of audio and image description technologies to improve the accuracy and meaningfulness of state of the art music description methods. These methods are the core of content-based music information retrieval tasks.

Several standard tasks could benefit from it:

  • Synchronization of audio / video streams
  • Audio-visual quality assessment
  • Structural analysis and segmentation
  • Discovery of repeated themes & sections
  • Automatic video mashup generation
  • Music similarity computation
  • Genre / style classification
  • Artist identification
  • Emotion (mood) characterization
  • Optical music recognition (OMR)

Supervisors: Emilia Gómez (MTG) / Gloria Haro (GPI)

Applicants should have experience in audio and image signal processing, and hold a MSc in a related field (e.g. telecommunications, electrical engineering, mathematics, physics or computer science). Experience in scientific programming (Matlab/Python/C++) and excellent English are essential. Musical background and expertise on multimedia information retrieval are also valuable.

The grant involves teaching assistance, so interest for teaching is also valued.

More information on grant details:
Provisional starting date: November 2015

Interested candidates should send a motivation letter, a CV (preferably with references), and academic transcripts to Prof. Emilia Gómez (emilia [dot] gomez [at] upf [dot] edu) and Prof. Gloria Haro (gloria [dot] haro [at] upf [dot] edu) before September 10th. Please include in the subject [PhD Audio-Visual].

They will also have to apply to the PhD program of the DTIC of the UPF.

S. Essid and G. Richard, “Fusion of Multimodal Information in Music Content Analysis”. in Meinard Müller, Masataka Goto and Markus Schedl (Eds) “Multimodal Music Processing”, Dagstuhl Follow-ups, volume 3, pp. 37-53, ISBN 978-3-939897-37-8, 2012.
M. Müller, M. Goto and M. Schedl (Eds) “Multimodal Music Processing”, Dagstuhl Follow-ups, volume 3, ISBN 978-3-939897-37-8, 2012.
A. Schindel & A. Rauber. A (2013). Music Video Information Retrieval Approach to Artist Identification, CMMR.
Y.W. Wang, Z. L.Z. Liu, & J.C. Huang. (2000). Multimedia content analysis-using both audio and visual clues. IEEE Signal Processing Magazine, 17(November). doi:10.1109/79.888862
Yue Wu, Tao Mei, Ying-Qing Xu, Nenghai Yu, Shipeng Li, “MoVieUp: Automatic Mobile Video Mashup”, IEEE Transactions on Circuits and Systems for Video Technology, 2015.

6 Aug 2015 - 13:25 | view
Freesound Labs

We have launched a new freesound-related site, Freesound Labs (

Freesound Labs is a directory of projects, hacks, apps, research and other initiatives that use content from Freesound or use the Freesound API. You can browse projects using tags and categories, and we plan to add a search functionality in the future.

Our aim is to include in this directory all relevant Freesound powered projects. Hence, if you know of projects that should be listed in this directory and are not yet there, please let us know by sending an email to freesound [at] freesound [dot] org.

27 Jul 2015 - 17:46 | view
Seminar by Clarence Barlow on his compositional approaches
2 Jul 2015

The composer Clarence Barlow will give a seminar entitled "Visualising Sound – Sonifying the Visual" on Thursday July 2nd at 15:30h in room 55.410 of the Communication Campus of the UPF.

The visualisation of musical sound can be pursued in a number of ways, two of which are rooted in technical considerations: the music could be intended for human performance, resulting in the development of a prescriptive performance score. If electroacoustic components are present in this music, these are often included as a graphic depiction (e.g. of the sound wave or as a sonagramme, or one reflecting the compositional methods etc.), thus more properly fulfilling a second main function – a descriptive one, mainly used in documentations, lectures and/or study scores (e.g. Ligeti’s Artikulation or Stockhausen’s Study II), but also sometimes prescriptively as part of a sort of (re-) construction kit.
Two other different approaches are rooted in the aesthetical (and possibly the synaesthetical): in sound visualisation it could simply be sound-derived images that satisfy; in the converse, image sonification, it could instead be the pleasure of extracting convincing music from optical sources, a comparison of source and result adding to the enjoyment. In multimedia such as film it could be the counterpoint of sound and image that pleases, especially if these are clearly bonded to another as when sound visualisation or image sonification is involved.
In the above, the vectors prescription-description and visualisation-sonification can work both ways, i.e. a prescriptive score is also potentially descriptive, one could (re-)imagine a visualised sound aurally, a sonified image visually.
In this presentation I would like to concentrate on these latter (syn-)aesthetic aspects as exemplified in my own work of several decades, having long been fascinated by the links between sound and image. These links mainly involve the concepts of  Position and Motion as well as of Colour, all of which are not only important aspects of music but fundamentally spatial, ultimately visual concepts: in musical contexts one speaks of “high” and “low”, of “fast” and “slow” (all of which comprise spatial terms – for instance the tempo indication andante literally means “walking”) as well as of “bright” and “dark” sounds and of “sound-colour”. Starting over thirty years ago, I have especially in recent times been repeatedly drawn to enacting these parallels. The first five examples are of sound visualisation, the last five (plus a footnote) of image sonification.

26 Jun 2015 - 13:53 | view
SMC Master Thesis Defenses
25 Jun 2015 - 30 Jun 2015

The oral presentations of the SMC Master Thesis will take place this Thursday 25th, Friday 26th and next Tuesday 30th of June from 9:30h to 12:30h in room 55.309. The defenses are public and all MTG researchers are encouraged to attend.

Student name

Title (tentative)





Computational modeling of hearing loss in musicians

Enric Guaus

June 25th



Computational assessment of noise environment in pop/rock band's layout

Enric Guaus

June 25th



Reducing Bias in Annotation Estimates for Probabilistic Evaluation of Audio Music Similarity

Julian Urbano / Emilia Gutierrez

June 25th



Electronic music genre classification

Perfecto Herrera

June 25th



Composer identification

Joan Serrà

June 25th



Measurement and computational model of the maximum stable gain in acoustic feedback scenarios

Nadine Kroher / Enric Giné

June 26th



Similarity Search for Sound Effects in Freesound: A unified model for Tags and Content

Xavier Serra / Frederic Font

June 26th



Generating Singing Voice Expression Based On Machine Learning

Martí Umbert / Merlijn Blaauw

June 26th



Source Localization for Enhancement of Orchestral Music from Multi-sensor Recordings

Julio Carabias / Jordi Janer

June 26th



Source Separation ­based music processing for assistive listening

Jordi Janer / Waldo Nogueira

June 26th



Lemur Synthesis with context-dependent HMMs

Jordi Bonada / Marco Gamba

June 26th



Automatic drum sound classification

Perfecto Herrera

June 30th



Content Based Electronic Music Session Reconstruction

Perfecto Herrera

June 30th



A Platform to creatively explore Frequently Used Samples in Rap Music History

Perfecto Herrera

June 30th



Hexaphonic Guitar Transcription and Visualization

Rafael Ramirez

June 30th



Transcription of Percussion Patterns in Hindustani Classical Music

Xavier Serra / Ajay Srinivasamurthy

June 30th



22 Jun 2015 - 14:37 | view
Seminar by engineers from SoundCloud on their approach to data products
22 Jun 2015

Several engineers from SoundCloud will give a seminar on "Building Data Products at SoundCloud" on Monday 22nd at 18:30h in room 55.410.

Presenters: Josh Devins, Rany Keddo, Dr. Özgür Demir, Dr. Alexey Rodriguez Yakushev, Dr. Christoph Sawade

Abstract: Serving over 150M listeners every month, SoundCloud is the world’s leading audio platform. We have a vibrant community of creators uploading new and unique content every second of the day. To capitalise on the volume and variety of content on the platform, we must rely on a number of tools, techniques and approaches to building products.

In this talk we will present the general framework of how we approach building data products at SoundCloud. We will review two case studies of recent work as examples of applying our approach to the topics of discovering new content with personalised recommendations and applying structured metadata using genre classification.
16 Jun 2015 - 10:34 | view
Keynote by Emilia Gomez at MCM2015

Emilia Gómez has been invited to give a keynote speech at the Fifth Biennial International Conference on Mathematics and Computation in Music (MCM2015) which will be held on 22-25 June-2015 at Queen Mary University of London. The talk is on "Computational models of symphonic music: challenges and opportunities" and relates to the objectives and results achieved in the European project PHENICX which she leads.

The MCM2015 conference brings together researchers from around the world who combine mathematics or computation with music theory, music analysis, composition and performance. MCM provides a dedicated platform for the communication and exchange of ideas amongst researchers in mathematics, informatics, music theory, composition, musicology, and related disciplines.

10 Jun 2015 - 21:09 | view
MTG involvement at Sonar+D

As in the past few years, the MTG is involved in a number of activities included in the Festival Sonar that takes place from June 18th to 20th, specifically in its professional area Sonar+D.

Here you have a briefing about what's happening during the following week:

When? Where? What?
June 16th Museu de la Música

“Barcelona, Capital Electrònica” discussion and “Voices” concert, with occassion of Phonos 40th anniversary.

The round table will have as participants John Chowning, researcher at the CCRMA (Computer Research in Music and Acoustics) at Stanford University; Andrés Lewin-Richter, composer and member of the Phonos Foundation; Enric Palau, co-founder and co-director of Sónar; and Xavier Serra will be the moderator.

The concert “Voices” based on ancient Greek mythological literature texts where John Chowning will process the voice of soprano Maureen Chowning with a computer. 

June 17th Hangar  Pre-event meeting of the Music Hack Day. This pre-event includes workshops on sensing, design and digital audio tools, talks by key figures on the field, and performances by artists exploring the intersection of music and wearable computing. This pre-event workshop is sponsored by the European Projects #MusicBricks and RAPID-MIX.
June 18th-19th Sonar+D

This year the Barcelona Music Hack Day will offer a special track on wearable and multimodal technology applied to music creation and performance. This special track will bring together experts on bio and motion sensing, interaction design and wearable interface prototyping.

100 selected hackers will be conceptualising and developing new tools for music creation and performance using the tools provided by more than 20 companies at international level.

June 19th Sonar+D 

Talk about "De-westernalizing music through technology" in which Kalbata (Ariel Tagar), music producer and record label owner, Peter Kirn, founder and editor of Create Digital Music, and Xavier Serra will discuss about how music technology may provide a greater understanding and appreciation of all the musical traditions with non-Western roots.


9 Jun 2015 - 11:06 | view
Seminar by John Chowning on his computer music works
15 Jun 2015
John Chowning, father of FM synthesis and Computer Music Pioneer, will give a talk on "Composing music from the inside-out" on Monday June 15th at 15:00h in the Auditorium of the Communication Campus of the UPF. He will also participate in a round table on June 16th at 18:30h at the Barcelona Music Museum where his piece "Voices" will also be performed.
Lecture abstract: A lecture/demonstration showing how the capacity of computer systems 50 years ago limited composers/researchers to only one of the sound generating processes that are available today: synthesis.   But, we learned much about the perception of sound as we wrapped our aural skills around the technology and discovered how to create music from fundamental units. Using sound synchronous animated slides, I will demonstrate how my earliest work in spatialization led to the discovery of FM synthesis in 1967.  Their development gave rise to perceptual insights that led to the synthesis of the singing voice on which both Phonē (1981) and Voices (2011) depend.  Beginning with Stria (1977), the scale (pitch space) and the inharmonic timbres (spectral space) are rooted in the Golden Ratio.  On these underpinnings I composed my music—based on what was known and what we learned about perception—from the inside-out. With the participation of Maureen Chowning, the presentation will conclude with a demonstration of the workings of the MaxMSP patch complex that accompanies the solo soprano in Voices—“hands free.”
Biography: Chowning is professor emeritus at Stanford University and founding director of the Center for Computer Research in Music and Acoustics (CCRMA). One of the great pioneers of computer music, he was among the first to realize the tremendous musical possibilities of the digital computer. His discovery of the FM synthesis algorithm in 1967 was licensed to Yamaha and popularized in the most successful synthesis engine in the history of electronic instruments, enabling precise control over new realms of sonic possibilities. Chowning’s innovations also extend into sound spatialization and musical temperament.
8 Jun 2015 - 09:43 | view
Frederic Font defends his PhD thesis on June 11th
11 Jun 2015
Frederic Font defends his PhD thesis entitled "Tag Recommendation using Folksonomy Information for Online Sound Sharing Platforms" on Thursday June 11th 2015 at 11:00h in room 55.410 of the Communication Campus of the UPF.
The jury of the defense is: Mark Sandler (Queen Mary, London), Sergi Jordà (UPF), Iván Cantador (UAM, Madrid)
Thesis abstract: Online sharing platforms host a vast amount of multimedia content generated by its own users. Such content is typically not uniformly annotated and can not be straightforwardly indexed. Therefore, making it accessible to other users poses a real challenge which is not specific of online sharing platforms. In general, content annotation is a common problem in all kinds of information systems. In this thesis, we focus on this problem and propose methods for helping users to annotate the resources they create in a more comprehensive and uniform way. Specifically, we work with tagging systems and propose methods for recommending tags to the content creators during the annotation process. To this end, we exploit information gathered from previous resource annotations in the same sharing platform, the so called folksonomy. Tag recommendation is evaluated using several methodologies, with and without the intervention of users, and in the context of large-scale tagging systems. We focus on the case of tag recommendation for sound sharing platforms and, besides studying the performance of several methods in this scenario, we analyse the impact of one of our proposed methods on the tagging system of a real-world and large-scale sound sharing site. As an outcome of this thesis, one of the proposed tag recommendation methods is now being daily used by hundreds of users in this sound sharing site. Furthermore, we explore a new perspective for tag recommendation which, besides taking advantage of information from the folksonomy, employs a sound-specific ontology to guide users during the annotation process. Overall, this thesis contributes to the advancement of the state of the art in tagging systems and folksonomy-based tag recommendation and explores interesting directions for future research. Even though our research is motivated by the particular challenges of sound sharing platforms and mainly carried out in that context, we believe our methodologies can be easily generalised and thus be of use to other information sharing platforms.
4 Jun 2015 - 15:38 | view
Xavier Serra gives a seminar on his research career
4 Jun 2015

Xavier Serra gives a seminar entitled "Research highlights in my journey within the field of Sound and Music Computing" on June 4th at 11:30h in the Auditorium of the Poblenou Campus of the UPF as part of the DTIC Integrative Research Seminars.


In this presentation I will go over some of the research I have been involved with in my thirty-year career within the field of Sound and Music Computing, emphasizing the goals I aimed for and identifying some of the results obtained.
My personal research, and the one I have directly supervised, has been mainly focused on the analysis, description and synthesis of sound and music signals. My initial efforts were dedicated to analyze and transform complex and musically relevant sounds; sounds that were not well captured by the audio processing techniques used at that time. My approach was to use spectral analysis and synthesis techniques to develop a deterministic plus stochastic model with which to obtain sonically and musically meaningful parameterizations and descriptions. That work had practical applications for synthesizing and transforming a wide variety of sounds, including the voice.
As a natural evolution of that initial research I became interested in going from single sounds to collections of sounds, thus being able to describe and model the relationships between sound entities. To tackle that I had to incorporate methodologies coming from disciplines such as machine learning and semantic technologies in order to complement the signal processing approaches I was using. A major bottleneck for carrying out that research was the availability of large and adequate audio collections. To solve that we developed a platform for collecting and sharing sounds and then using the collected sounds we carried out research on the automatic description of sound signals. We now have large publicly available sound collections and we have developed information retrieval tools of relevance for several sound and music applications.
In the last few years, and in the context of the music information retrieval field, it has become evident that to make sense of sound and music data we need to incorporate domain knowledge. In order to improve the distance measures used in exploration and retrieval tasks we need to incorporate the knowledge that exist around sound and music. To that aim, most of my current projects focus on the study of specific sound and music repertories and on targeting well-defined tasks, trying to formalize and represent user knowledge with which to develop applications of relevance to those users. We are putting together coherent corpora; we are working on audio analysis techniques specific for these corpora and chosen tasks; we are gathering and analyzing user and contextual information to train our data models; and we are developing task-oriented tools to interact with particular sound and music collections. Our results show the benefit of this type of information processing research in which bottom up and top down approaches are combined.
My research has always been motivated by music, by the interest of developing musical tools that can be socially and culturally relevant. In this talk I want to emphasize this aspect while talking about my thirty-year research journey.

Video of the talk:

26 May 2015 - 03:09 | view