Music remixing using source separation to improve cochlear implant users music perception

TitleMusic remixing using source separation to improve cochlear implant users music perception
Publication TypeMaster Thesis
Year of Publication2015
AuthorsPons, J.
AbstractMusic appreciation remains rather poor for many Cochlear Implant (CI) users due to their poor pitch perception. Simple music structures with a clear rhythm/beat are well perceived for CI users. A previous publication which studies the mixing preferences of CI users on vocal western music, shows a signi cant preference for higher vocals and attenuated background instruments. By re-mixing the music they are able to simplify the signal to make it more suitable for implantees. But the multitrack recordings necessary to generate a re-mix are not always accessible; only mono/stereo pre-mixed audio les are available. In order to overcome this limitation, we propose to use current Source Separation (SS) state-of-the-art techniques to estimate the multitrack recordings. The perceptual studies conducted are focused on studying how the errors/artifacts produced by a SS algorithm, Non-negative Matrix Factorization (NMF), aff ect the music mixing preferences. These show that when attenuating the background instruments by 6dB, the artifacts/errors present in the vocals are not perceived by CI users. Then, SS can be used to estimate the multitrack. To our knowledge, no previous work exist on trying to simplify classical music for CI users by means of re-mixing. This work shows the influence of the music genre on CI users mixing preferences. We show that CI users with classical musical training have a signi ficant preference for mixing pre-sets that enforce musicological details dicult to encode with CIs (others than beat). However, CI users without classical music training do not show any signifi cant preference, probably due to the lack of music understanding. This work also shows how CI users may not bene t from general mixing pre-sets solutions. Technologies like SS, that allow individual confi gurations, seem to be the right approach towards a better music appreciation. ii Additionally, we studied a new approach for source signal separation based on deep recurrent neural networks (DRNN). Recently, some researchers successfully used DRNN for singing voice separation from monaural recordings in a supervised setting. A great advantage of this technique, compared to NMF, is that allows similar performance reducing the processing time; which is crucial for CI applications. In this work, we investigated how di fferent theoretically motivated initialization schemes behave when training DRNN for SS. Concluding that if the initialization allows the output activations to be inside the data range, the model is able to find a good local minimum. It is also introduced a theoretically motivated interpretation of why music models (considering neighbouring frames as input vector) do not su er the gradient vanish/explode problem.
Final publicationhttps://doi.org/10.5281/zenodo.1164019
Additional material: 
  • Code for the mixing console - a web app multitrack player.
  • Code for the DRNN source separation alghoritm.
  • Code for the NMF source separation alghoritm.

Associated publication:

Jordi Pons, Jordi Janer, Thilo Rode & Waldo Nogueira (2016, December). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. Journal of the Acoustical Society of America, vol. 140, no 6, p. 4338-4349. [paper]

intranet