This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
FlowDec (ICLR 2025) is a full-band audio codec for general audio sampled at 48 kHz that combines non-adversarial codec training with a stochastic postfilter based on a novel conditional flow matching method.
See our demo page here.
- 2025/03/03 First version is released
Create a new virtual environment (we recommend Python 3.10) and run
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126
(or whatever matches your local CUDA version).
You can find the checkpoints for FlowDec-75m and FlowDec-25s, as well as the weights for the underlying NDAC codecs NDAC-75 and NDAC-25, here.
Please check out the notebook demo.ipynb
for how to run inference using the pretrained checkpoints.
We use Hydra for model configuration and training. For training config files, see the config/
folder.
NOTE: We do not provide training/validation/test datasets here, so the training configurations in config/
all use a dummy datamodule config config/datamodule/example.yaml
. To actually train FlowDec, you should pre-enhance your own dataset(s) with a pre-trained underlying codec, save the results as .wav files, and store the paired paths in a text file. You can for instance use our pre-trained NDAC variants - see the "Inference" section for how to run them.
The expected input format for FlowDec datasets is a file containing a comma-separated list of paths, e.g.:
/clean_path/file1.wav,/codec_output_path/file1.wav
/clean_path/file2.wav,/codec_output_path/file2.wav
[...]
where you would then have train.txt
, validation.txt
and test.txt
each of this format, and adapt the datamodule config file to use these three .txt files instead of the dummy file.
After modifying the datamodule, you can then for example run:
python train.py --config-name flowdec_75m
For automatically determining the frequency-dependent sigma_y (see Section 3.5 in our paper), you can use the helper script scripts/estimate_flowdec_params.py
. This script also implements the heuristic for a global sigma_y discussed in our Appendix A.1.
If you use our models, methods, or any derivatives thereof, please cite our paper:
@inproceedings{
welker2025flowdec,
title={{FlowDec}: A flow-based full-band general audio codec with high perceptual quality},
author={Simon Welker and Matthew Le and Ricky T. Q. Chen and Wei-Ning Hsu and Timo Gerkmann and Alexander Richard and Yi-Chiao Wu},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=uxDFlPGRLX}
}
The majority of FlowDec is licensed under CC-BY-NC, however portions of the project are available under separate license terms: conditional-flow-matching, sgmse, BioinfoMachineLearning, audiotools, and descript-audio-code are licensed MIT; NCSN++ is licensed Apache 2.0.