A Wavenet for Speech Denoising

Dario Rethage; Pons, Jordi; Xavier Serra

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

A Wavenet for Speech Denoising

Title	A Wavenet for Speech Denoising
Publication Type	Conference Paper
Year of Publication	2018
Conference Name	43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018)
Authors	Rethage, D. , Pons J. , & Serra X.
Abstract	Currently, most speech processing techniques use magnitude spectrograms as front-end and are therefore by default discarding part of the signal: the phase. In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet. The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature. Specifically, the model makes use of non-causal, dilated convolutions and predicts target fields instead of a single target sample. These modifications make the model highly parallelizable during both training and inference. Furthermore, we propose a novel energy-conserving loss that directly operates on the raw audio level. This loss also considers the quality of the estimated background-noise signal (computed by applying a parameterless operation to the input) during training. This direct link to the input enforces conserving the energy of the signal throughout the pipeline. Both computational and perceptual evaluations indicate that the proposed method is preferred to Wiener filtering, a common method based on processing the magnitude spectrogram.
preprint/postprint document	https://arxiv.org/abs/1706.07162

Additional material:

Code: https://github.com/drethage/speech-denoising-wavenet
Audio examples: http://jordipons.me/apps/speech-denoising-wavenet/
Demo: https://youtu.be/Sk-6CNUq_-Q