DeepXi (where the Greek letter 'xi' or ξ is ponounced /zaɪ/) is a residual bidirectional long short-term memory (ResBLSTM) network a priori SNR estimator that was proposed in [1]. It can be used by minimum mean-square error (MMSE) approaches like the MMSE short-time spectral amplitude (MMSE-STSA) estimator, the MMSE log-spectral amplitude (MMSE-LSA) estimator, and the Wiener filter (WF) approach. It can also be used to estimate the ideal ratio mask (IRM) and the ideal binary mask (IBM). DeepXi is implemented in TensorFlow and is trained to estimate the a priori SNR for single channel noisy speech with a sampling frequency of 16 kHz.
- TensorFlow (installed in a virtual environment)
- Python3
- MATLAB
It is recommended to use a virtual environment.
git clone https://github.com/anicolson/DeepXi.git
pip install -r requirements.txt
A trained model can be downloaded from here. Unzip and place in the model directory. The model was trained with a sampling rate of 16 kHz.
Simply run the script (python3 deepxi.py). Run the script in the virtual environment that TensorFlow is installed in. The script has different inference options, and is also able to perform training if required.
Directory | Description |
---|---|
lib | Functions for deepxi.py. |
model | The directory for the model (the model must be downloaded). |
noisy_speech | Noisy speech. Place noisy speech .wav files to be enhanced here. |
output | DeepXi outputs, including the enhanced speech .wav output files. |
stats | Statistics of a sample from the training set. The mean and standard deviation of the a priori SNR for the sample are used to compute the training target. |
[1] A. Nicolson and K. K. Paliwal, "Deep Learning For Minimum Mean-Square Error Approaches to Speech Enhancement", Submitted to Speech Communication.