Skip to content

This is the official implementation of the SEMamba paper.

Notifications You must be signed in to change notification settings

riversidefm/SEMamba

 
 

Repository files navigation

SEMamba

This is the official implementation of the SEMamba paper.
For more details, please refer to: An Investigation of Incorporating Mamba for Speech Enhancement

Requirement

* Python >= 3.9
* CUDA >= 12.0
* PyTorch == 2.2.2

Model

SEMamba advanced model

ASR Word Error Rate

We have tested the ASR results using OpenAI Whisper on the test set of VoiceBank-DEMAND.

The evaluation code will be released in the future.

VCTKDEMAND WER Results

Additional Notes

  1. Ensure that both the nvidia-smi and nvcc -V commands show CUDA version 12.0 or higher to verify proper installation and compatibility.

  2. Currently, it supports only GPUs from the RTX series and newer models. Older GPU models, such as GTX 1080 Ti or Tesla V100, may not support the execution due to hardware limitations.

Installation

(Suggested:) Step 0 - Create a Python environment with Conda

It is highly recommended to create a separate Python environment to manage dependencies and avoid conflicts.

conda create --name mamba python=3.9
conda activate mamba

Step 1 - Install PyTorch

Install PyTorch 2.2.2 from the official website. Visit PyTorch Previous Versions for specific installation commands based on your system configuration (OS, CUDA version, etc.).

Step 2 - Install Required Packages

After setting up the environment and installing PyTorch, install the required Python packages listed in requirements.txt.

pip install -r requirements.txt

Step 3 - Install the Mamba Package

Navigate to the mamba_install directory and install the package. This step ensures all necessary components are correctly installed.

cd mamba_install
pip install .

Note: Installing from source (provided mamba_install) can help prevent package issues and ensure compatibility between different dependencies. It is recommended to follow these steps carefully to avoid potential conflicts.

Step 4 (By Arno@Riverside) - Libcuda.so

Run the following:
ll /lib/x86_64-linux-gnu|grep libcuda

If you do not see a file libcuda.so, you will need to manually create a simlink to libcuda.so.1. If this file does not exist either, then there might be a missing cuda installation.
Create the simlink with:
cd /lib/x86_64-linux-gnu && sudo ln -s libcuda.so.1 libcuda.so

Then you will need python/C libraries. Run the following:
sudo apt-get install python3-dev
You might need to specify the exact python version you are running.

Training the Model

Step 1: Prepare Dataset JSON

Create the dataset JSON file using the script sh make_dataset.sh. You may need to modify make_dataset.sh and data/make_dataset_json.py.

Alternatively, you can directly modify the data paths in data/train_clean.json, data/train_noisy.json, etc.

Step 2: Run the following script to train the model.

sh run.sh

Note: You can use tensorboard --logdir exp/path_to_your_exp/logs to check your training log

Using the Pretrained Model

Modify the --input_folder and --output_folder parameters in pretrained.sh to point to your desired input and output directories. Then, run the script.

sh pretrained.sh

Implementing the PCS Method in SEMamba

There are two methods to implement the PCS (Perceptual Contrast Stretching) method in SEMamba:

  1. Use PCS as Training Target:
  • Run the sh runPCS.sh with the yaml configuration use_PCS400=True.
  • Use the pretrained model sh pretrained.sh without post-processing --post_processing_PCS False.
  1. Use PCS as Post-Processing:
  • Run the sh run.sh with the yaml configuration use_PCS400=False.
  • Use the pretrained model sh pretrained.sh with post-processing --post_processing_PCS True.

Evaluation

The evaluation metrics is calculated via: CMGAN

The evaluation code will be released in the future.

Perceptual Contrast Stretching

The implementation of Perceptual Contrast Stretching (PCS) as discussed in our paper can be found at PCS400.

References and Acknowledgements

We would like to express our gratitude to the authors of MP-SENet, CMGAN, HiFi-GAN, and NSPP.

Citation:

If you find the paper useful in your research, please cite:

@article{chao2024investigation,
  title={An Investigation of Incorporating Mamba for Speech Enhancement},
  author={Chao, Rong and Cheng, Wen-Huang and La Quatra, Moreno and Siniscalchi, Sabato Marco and Yang, Chao-Han Huck and Fu, Szu-Wei and Tsao, Yu},
  journal={arXiv preprint arXiv:2405.06573},
  year={2024}
}

About

This is the official implementation of the SEMamba paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 61.5%
  • Cuda 26.5%
  • C++ 10.8%
  • Other 1.2%