Audio Source Separation Using Deep Neural Networks

TitleAudio Source Separation Using Deep Neural Networks
Publication TypeMaster Thesis
Year of Publication2016
AuthorsChandna, P.
AbstractThis thesis presents a low latency online source separation algorithm based on convolutional neural networks. Building on ideas from previous research on source separation, we propose an algorithm using a deep neural network with convolutional layers. This type of neural network has resulted in state-of-the-art techniques for several image processing problems. We try to adapt these ideas to the audio domain, focusing on low-latency extraction of 4 tracks (vocals, bass, drums and other instruments) from a single-channel (monaural) musical recording. We try to minimize processing time for the algorithm without compromising on performance through data compression. The Mixing Secrets Dataset 100 (MSD100) and the Demixing Secrets Dataset 100 (DSD100) are used for evaluation of the methodology . The results achieved by the algorithm show a 8.4 dB gain in SDR and a 9 dB gain in SAR for vocals over the state-of-the-art deep learning source separation approach using recurrent neural networks.The system’s performance is comparable with other state-ofthe- art algorithms like non-negative matrix factorization, in terms of separation performance, while improving significantly on processing time. This thesis is a stepping block for further research in this area, particularly for implementation of source separation algorithms for medical purposes like speech enhancement for cochlear implants, a task that requires low-latency.
intranet