Audio Signal Processing
- Jordi Bonada, co-leader
- Jordi Janer, co-leader
- Ricard Marxer, PhD student
- Merlijn Blaauw, researcher
- Marti Umbert, PhD student
- Graham Coleman, PhD student
- Saso Musevic, PhD student
Our research in the field of audio signal processing is wide and multidisciplinary, with an important focus on technology transfer acknowledged by dozens of patents and several commercial products of great success. Currently our interests spread in the area of singing voice synthesis, voice transformation, source separation and automatic soundscape generation.
Singing Voice Synthesis
For more than a decade we have been developing models and specific approaches for the synthesis of the singing voice based on the concept of performance sampling, with the aim of achieving a natural singing synthesizer (ex: Bonada & Serra, 2007; Bonada, 2008). In this sense, we have been continuously collaborating with Yamaha Corp in this area, and this collaboration has resulted in the popular Vocaloid commercial synthesizer. More recent research deals with singing style modeling that learns how to imitate the expression of a singer given some of her/his recordings.
Musical Audio Signal Separation addresses the problem of segregating certain signals from a musical mixture. We focused on the analysis and extraction of the predominant voice from polyphonic music (ex: Marxer et al., 2012), and percussion components (e.g. Janer et al. 2012). These algorithms have various applications including musical production (e.g remixes), entertainment (e.g. karaoke) or cultural heritage (e.g. restoration).
Other research topics
Real-time voice processing algorithms (ex: Mayor et al., 2011) have been integrated in the licensed Kaleivoicecope technology. We investigated topics such as voice quality transformations, voice impersonation, speech processing, emotional speech synthesis, voice enhancement, non-stationary sinusoidal analysis.
We have contributed to the voice analysis field with several methods for automatically transcribing melody and expression, as well as rating a singing performance (ex: Mayor et al., 2009). We have also adapted voice conversion strategies used in speech to the specificities of the singing voice, allowing creating singer models when only a limited amount of audio material is available (ex: Villavicencio & Bonada, 2010).
Processing polyphonic audio has also drawn our interest, inclusing polyphonic time-scaling, tempo detection, rhythm modification (ex: Janer et al, 2006), tonal analysis and visualization (ex: Gómez & Bonada, 2005), audio mosaicing (ex: Coleman et al, 2010), score following.
In the last years, we have extended the concept of performance sampling to the violin case, contributing with novel approaches for accurately capturing performer gestures with nearly non-intrusive sensing techniques, and statistically modeling the temporal contour of those gestures and the timbre they produce (ex: Maestre et al., 2010). Instrumental gesture modeling has shown to be a natural approach to control physical models, thus to fill the gap between the high-level controls of a symbolic score and the low-level input of the physical system.
Beyond music and voice signals, we applied our algorithms to environmental sounds and bioacoustics as well. For example, the generation of soundscapes (ex: Janer et al., 2011) or analysis and denoising of marine bioacoustic signals.