Skip to content
/ DeepXi Public
forked from anicolson/DeepXi

DeepXi: Residual Network-based A Priori SNR Estimator for Speech Enhancement

Notifications You must be signed in to change notification settings

yunzqq/DeepXi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepXi: Residual Bidirectional Long Short-Term Memory (ResBLSTM) Network A Priori SNR estimator

DeepXi (where the Greek letter 'xi' or ξ is ponounced /zaɪ/) is a residual bidirectional long short-term memory (ResBLSTM) network a priori SNR estimator that was proposed in [1]. It can be used by minimum mean-square error (MMSE) approaches like the MMSE short-time spectral amplitude (MMSE-STSA) estimator, the MMSE log-spectral amplitude (MMSE-LSA) estimator, and the Wiener filter (WF) approach. It can also be used to estimate the ideal ratio mask (IRM) and the ideal binary mask (IBM). DeepXi is implemented in TensorFlow and is trained to estimate the a priori SNR for single channel noisy speech with a sampling frequency of 16 kHz.

Prerequisites

Installation

It is recommended to use a virtual environment.

  1. git clone https://github.com/anicolson/DeepXi.git
  2. pip install -r requirements.txt

Download the Model

A trained model can be downloaded from here. Unzip and place in the model directory. The model was trained with a sampling rate of 16 kHz.

How to Perform Speech Enhancement

Simply run the script (python3 deepxi.py). Run the script in the virtual environment that TensorFlow is installed in. The script has different inference options, and is also able to perform training if required.

Directory Description

Directory Description
lib Functions for deepxi.py.
model The directory for the model (the model must be downloaded).
noisy_speech Noisy speech. Place noisy speech .wav files to be enhanced here.
output DeepXi outputs, including the enhanced speech .wav output files.
stats Statistics of a sample from the training set. The mean and standard deviation of the a priori SNR for the sample are used to compute the training target.

References

[1] A. Nicolson and K. K. Paliwal, "Deep Learning For Minimum Mean-Square Error Approaches to Speech Enhancement", Submitted to Speech Communication.

About

DeepXi: Residual Network-based A Priori SNR Estimator for Speech Enhancement

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%