Skip to content

An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.

License

Notifications You must be signed in to change notification settings

facebookresearch/FlowDec

Repository files navigation

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

FlowDec

FlowDec (ICLR 2025) is a full-band audio codec for general audio sampled at 48 kHz that combines non-adversarial codec training with a stochastic postfilter based on a novel conditional flow matching method.

Demo

See our demo page here.

News

  • 2025/03/03 First version is released

Installation

Create a new virtual environment (we recommend Python 3.10) and run

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126

(or whatever matches your local CUDA version).

Checkpoints

You can find the checkpoints for FlowDec-75m and FlowDec-25s, as well as the weights for the underlying NDAC codecs NDAC-75 and NDAC-25, here.

Inference

Please check out the notebook demo.ipynb for how to run inference using the pretrained checkpoints.

Training

We use Hydra for model configuration and training. For training config files, see the config/ folder.

Data preparation

NOTE: We do not provide training/validation/test datasets here, so the training configurations in config/ all use a dummy datamodule config config/datamodule/example.yaml. To actually train FlowDec, you should pre-enhance your own dataset(s) with a pre-trained underlying codec, save the results as .wav files, and store the paired paths in a text file. You can for instance use our pre-trained NDAC variants - see the "Inference" section for how to run them.

The expected input format for FlowDec datasets is a file containing a comma-separated list of paths, e.g.:

/clean_path/file1.wav,/codec_output_path/file1.wav
/clean_path/file2.wav,/codec_output_path/file2.wav
[...]

where you would then have train.txt, validation.txt and test.txt each of this format, and adapt the datamodule config file to use these three .txt files instead of the dummy file.

Running training

After modifying the datamodule, you can then for example run:

python train.py --config-name flowdec_75m

Frequency-dependent sigma_y

For automatically determining the frequency-dependent sigma_y (see Section 3.5 in our paper), you can use the helper script scripts/estimate_flowdec_params.py. This script also implements the heuristic for a global sigma_y discussed in our Appendix A.1.

Citation

If you use our models, methods, or any derivatives thereof, please cite our paper:

@inproceedings{
    welker2025flowdec,
    title={{FlowDec}: A flow-based full-band general audio codec with high perceptual quality},
    author={Simon Welker and Matthew Le and Ricky T. Q. Chen and Wei-Ning Hsu and Timo Gerkmann and Alexander Richard and Yi-Chiao Wu},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=uxDFlPGRLX}
}

License

The majority of FlowDec is licensed under CC-BY-NC, however portions of the project are available under separate license terms: conditional-flow-matching, sgmse, BioinfoMachineLearning, audiotools, and descript-audio-code are licensed MIT; NCSN++ is licensed Apache 2.0.

About

An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published