Music remixing using source separation to improve cochlear implant users music perception

Pons, Jordi

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Music remixing using source separation to improve cochlear implant users music perception

Title	Music remixing using source separation to improve cochlear implant users music perception
Publication Type	Master Thesis
Year of Publication	2015
Authors	Pons, J.
Abstract	Music appreciation remains rather poor for many Cochlear Implant (CI) users due to their poor pitch perception. Simple music structures with a clear rhythm/beat are well perceived for CI users. A previous publication which studies the mixing preferences of CI users on vocal western music, shows a signicant preference for higher vocals and attenuated background instruments. By re-mixing the music they are able to simplify the signal to make it more suitable for implantees. But the multitrack recordings necessary to generate a re-mix are not always accessible; only mono/stereo pre-mixed audio les are available. In order to overcome this limitation, we propose to use current Source Separation (SS) state-of-the-art techniques to estimate the multitrack recordings. The perceptual studies conducted are focused on studying how the errors/artifacts produced by a SS algorithm, Non-negative Matrix Factorization (NMF), affect the music mixing preferences. These show that when attenuating the background instruments by 6dB, the artifacts/errors present in the vocals are not perceived by CI users. Then, SS can be used to estimate the multitrack. To our knowledge, no previous work exist on trying to simplify classical music for CI users by means of re-mixing. This work shows the influence of the music genre on CI users mixing preferences. We show that CI users with classical musical training have a significant preference for mixing pre-sets that enforce musicological details dicult to encode with CIs (others than beat). However, CI users without classical music training do not show any significant preference, probably due to the lack of music understanding. This work also shows how CI users may not benet from general mixing pre-sets solutions. Technologies like SS, that allow individual configurations, seem to be the right approach towards a better music appreciation. ii Additionally, we studied a new approach for source signal separation based on deep recurrent neural networks (DRNN). Recently, some researchers successfully used DRNN for singing voice separation from monaural recordings in a supervised setting. A great advantage of this technique, compared to NMF, is that allows similar performance reducing the processing time; which is crucial for CI applications. In this work, we investigated how different theoretically motivated initialization schemes behave when training DRNN for SS. Concluding that if the initialization allows the output activations to be inside the data range, the model is able to find a good local minimum. It is also introduced a theoretically motivated interpretation of why music models (considering neighbouring frames as input vector) do not suer the gradient vanish/explode problem.
Final publication	https://doi.org/10.5281/zenodo.1164019

Additional material:

Code for the mixing console - a web app multitrack player.
Code for the DRNN source separation alghoritm.
Code for the NMF source separation alghoritm.

Associated publication:

Jordi Pons, Jordi Janer, Thilo Rode & Waldo Nogueira (2016, December). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. Journal of the Acoustical Society of America, vol. 140, no 6, p. 4338-4349. [paper]