The research of the MTG is quite close to the core of the Sound and Music Computing interdisciplinariety; combining strengths in basic disciplines, such as Signal Processing, Machine Learning and Human Computer Interaction, and being able to use many other disciplines/technologies of interest for solving specific application-driven problems. For a description of the currently funded research projects check the Projects section and for a description of the technologies resulting from that research go to the Technologies section.
Our research is organized in the following areas:
Our research in the field of audio signal processing is wide and multidisciplinary, with an important focus on technology transfer acknowledged by dozens of patents and several commercial products of great success. Currently our interests spread in the area of singing voice synthesis, voice transformation, source separation and automatic soundscape generation.
Singing Voice Synthesis
For more than a decade we have been developing models and specific approaches for the synthesis of the singing voice based on the concept of performance sampling, with the aim of achieving a natural singing synthesizer (ex: Bonada & Serra, 2007; Bonada, 2008). In this sense, we have been continuously collaborating with Yamaha Corp in this area, and this collaboration has resulted in the popular Vocaloid commercial synthesizer. More recent research deals with singing style modeling that learns how to imitate the expression of a singer given some of her/his recordings.
Musical Audio Signal Separation addresses the problem of segregating certain signals from a musical mixture. We focused on the analysis and extraction of the predominant voice from polyphonic music (ex: Marxer et al., 2012), and percussion components (e.g. Janer et al. 2012). These algorithms have various applications including musical production (e.g remixes), entertainment (e.g. karaoke) or cultural heritage (e.g. restoration).
Other research topics
Real-time voice processing algorithms (ex: Mayor et al., 2011) have been integrated in the licensed Kaleivoicecope technology. We investigated topics such as voice quality transformations, voice impersonation, speech processing, emotional speech synthesis, voice enhancement, non-stationary sinusoidal analysis.
We have contributed to the voice analysis field with several methods for automatically transcribing melody and expression, as well as rating a singing performance (ex: Mayor et al., 2009). We have also adapted voice conversion strategies used in speech to the specificities of the singing voice, allowing creating singer models when only a limited amount of audio material is available (ex: Villavicencio & Bonada, 2010).
Processing polyphonic audio has also drawn our interest, inclusing polyphonic time-scaling, tempo detection, rhythm modification (ex: Janer et al, 2006), tonal analysis and visualization (ex: Gómez & Bonada, 2005), audio mosaicing (ex: Coleman et al, 2010), score following.
In the last years, we have extended the concept of performance sampling to the violin case, contributing with novel approaches for accurately capturing performer gestures with nearly non-intrusive sensing techniques, and statistically modeling the temporal contour of those gestures and the timbre they produce (ex: Maestre et al., 2010). Instrumental gesture modeling has shown to be a natural approach to control physical models, thus to fill the gap between the high-level controls of a symbolic score and the low-level input of the physical system.
Beyond music and voice signals, we applied our algorithms to environmental sounds and bioacoustics as well. For example, the generation of soundscapes (ex: Janer et al., 2011) or analysis and denoising of marine bioacoustic signals.
Within this area of research we aim at automatically generating “descriptors” that capture the sonological or musical features that are embedded in the audio signals. By combining signal-processing techniques with machine learning approaches we have obtained good results in analyzing some of the basic and most important musical facets, such as rhythm (Gouyon, 2005; Zapata et al., 2012), timbre (Herrera et al., 2003), tonality (Gómez, 2006; Martorell and Gómez, 2011), melody (Salamon and Gómez, 2012), or structure (Ong, 2007). Then from this type of descriptions and by bringing in other methodological approaches we have been able to get into topics that are more related to the semantic description of music, such as the concept of complexity (Streich, 2007), similarity (Bogdanov et al., 2009), music recommendation (Celma, 2009), genre (Guaus, 2009), mood (Laurier et al., 2009), social tags (Sordo, 2011) or song covers (Serrà et al., 2009). In order to approach the semantic aspect of sound and music we are also carrying out research on music cognition modeling (Purwins et al., 2008a; Purwins et al., 2008b) and we are interested on the use of MIR technologies in different music traditions, specially in flamenco music.
Our team is also very active in the International Society of Music Information Retrieval (ISMIR) community. We have been involved in its scientific committee and we contribute to ethnocomp (interest group on computational ethnomusicology), WiMIR (Women in MIR) and we moderate some community projects: Teaching MIR and Audio Melody Extraction Annotation Inittiative.
We have contributed to several MTG's technologies
This line of research started with our interest in studying musical performance and in developing interfaces for the real-time creation and exploration of music. Through the years we have approached this problem from different perspectives such as collective musical creation (Jordà, 2001) or developing a framework for the conception and design of new musical instruments (Jordà, 2005). An interesting result of these studies was the development of the Reactable (Jordà et al., 2005), an electronic music instrument that combines a tangible tabletop interface with concepts or techniques such as modular synthesis, visual programming and visual feedback.
From the technical needs of the Reactable project, we have also developed technologies such as reacTIVision (Bencina et al., 2005) for the tracking of tagged objects on tabletop surfaces or the TUIO protocol (Kaltenbrunner et al., 2005), specifically designed to simplify the communication between processes in a tangible user interface environment.
Currently, we focus our research on tabletop and tangible interaction (Jordà et al., 2010), studying how these type of interfaces can favor multi-dimensional and continuous real-time interaction, exploration and multi-user collaboration, thus expanding our areas of interest beyond the musical performance domain. Some of the topics we are exploring include: the potential of surface computing in areas such as edutainment, children, elder people and special education (Gallardo et al., 2008); the potential of these type of interfaces in complex interactive situations and in exploratory and expressive activities (Julià & Jordà, 2009); their potential for enhancing creative collaboration through effective emotional communication; and extending surface computing interaction beyond the surface. For some of these projects we collaborate with specialists from other fields, such as artificial intelligence, cognitive science and neuroscience, sociology and education.
We are interested in the cultural and community aspects of Sound and Music Computing, which means, in the development of music information processing techniques by modeling sounds and music together with the user community that is around them. This research is being carried out in the context of CompMusic, Freesound.org and SIGMUS.
In the context of CompMusic we are interested in the development of new music description techniques through the study of the art music traditions of India (Hindustani and Carnatic), Turkey (Ottoman), Maghreb (Andalusi), and China (Han) (ex: Serra 2012; Serra, 2011).
In the context of Freesound we are interested in issues of social computing, studying how we can improve the technologies behind Freesound through community profiling (ex: Font et al., 2012; Roma et al., 2012).
We are interested in the modeling of musical performance (ex: Maestre & Ramirez, 2010; Ramirez et al., 2010). Our research is being carried out mainly in the context of the Siempre and Drims projects.