Posts

A Wavenet for Speech Denoising

This week I started with reading some research papers. The one today which I read was about " A Wavenet for Speech Denoising " by Dario Rethage, Jordi Pons and Xavier Serra. Currently, most of the speech recognition techniques use magnitude spectrogram as frontend discarding the phase. In order to overcome this limitation, the paper was developed based on the wavenet framework by utilizing its acoustic modeling capabilities while ignoring its autoregressive nature thereby significantly reducing time complexity.  The model also uses non-causal, dilated convolutions. Non-causal means that the current value also depends on future values. Dilated convolutions can be used to make the receptive field grow faster by introducing a factor called dilation in the convolution layer. This means that it is possible to have empty spaces between each cell.  Raw audio waveforms (do not contain any header information)are successfully used for generative tasks, but usually, these have a t...

Setting up Pytorch

Image
For my capstone project, I choose the topic Speech Enhancement using a Wavenet or a Wavegan approach. The main aim of this project is to enhance speech signals by the use of speech synthesis rather than the traditional noise removal approaches. For understanding the existing methods for Speech Enhancement  I came across a GitHub code by "santi-pdp" . In order to run this code, I had to install all the prerequisites which were mentioned in the readme file. Setting up Pytorch: I already had PyTorch installed on my virtual machine but was not able to utilize GPUs inside the virtual machine. In order to use GPU for training, I decided to set up PyTorch in windows which would allow me to use my laptop GPU (RTX 2070).  Step 1 - INSTALLING ANACONDA : In order to install PyTorch on windows, I had to install Anaconda. In order to install Anaconda I went to the following link: https://www.anaconda.com/distribution/#download-section After downloading the 64-bit ...