A Wavenet for Speech Denoising
This week I started with reading some research papers. The one today which I read was about " A Wavenet for Speech Denoising " by Dario Rethage, Jordi Pons and Xavier Serra. Currently, most of the speech recognition techniques use magnitude spectrogram as frontend discarding the phase. In order to overcome this limitation, the paper was developed based on the wavenet framework by utilizing its acoustic modeling capabilities while ignoring its autoregressive nature thereby significantly reducing time complexity. The model also uses non-causal, dilated convolutions. Non-causal means that the current value also depends on future values. Dilated convolutions can be used to make the receptive field grow faster by introducing a factor called dilation in the convolution layer. This means that it is possible to have empty spaces between each cell. Raw audio waveforms (do not contain any header information)are successfully used for generative tasks, but usually, these have a t...