Reproducibility problems with Librispeech model #13

Vanlogh · 2024-10-23T21:16:07Z

I want to thank all the authors for the great work that they have done with this paper.

I am trying to reproduce the Librispeech model training to get a better sense of how the model is training in the hopes of building a
25Hz version of xcodec in the future.

I downloaded all the 960h Librispeech training from here and kept the config of the model as it is. I only changed batch size from 8 in 8 GPUs to 16 in 4 GPUs.

The problem I am running into is that the training is not stable. It seems to me that the GAN setting is difficult to train and is the main culprit of this.

I just wanted to ask if you have experienced this during the experiments and how you dealt with this. I am almost tempted to just resume the training from an earlier checkpoint. It would be really helpful if you guys can guide me here.

Thank you and I appreciate the time you've taken to read this!

ooooolong · 2024-10-30T09:00:18Z

Hi, bro. Can I add your WeChat and talk about some questions with you!

zhenye234 · 2024-11-04T14:56:45Z

Hi, i tried a lot on low-bitrate codec recently. For 25hz codec, maybe you can try vocos (iSTFT) decoder [1] since the model does not need to learn temporal upsampling. In addition, I will release a low-bitrate xcodec next month.

[1] Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Vanlogh · 2024-11-20T17:03:42Z

@zhenye234 thank you for responding. I did notice that audio reconstruction was very good when using only 1 RVQ layers from the 8 quantizers available. I was wondering what the cause for that might be and if that is an intended result?

I noticed you mention doing some kind of "dropout" of the quantizer layers. (i.e: randomly selecting RVQ layers from options [1, 2, 3, 4, 8]). However, It doesn't seem to me that having that alone allows you to do audio reconstruction with 1 RVQ layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility problems with Librispeech model #13

Reproducibility problems with Librispeech model #13

Vanlogh commented Oct 23, 2024

ooooolong commented Oct 30, 2024

zhenye234 commented Nov 4, 2024

Vanlogh commented Nov 20, 2024

Reproducibility problems with Librispeech model #13

Reproducibility problems with Librispeech model #13

Comments

Vanlogh commented Oct 23, 2024

ooooolong commented Oct 30, 2024

zhenye234 commented Nov 4, 2024

Vanlogh commented Nov 20, 2024