Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the input on ARCH benchmark #14

Open
dzr1026 opened this issue Nov 13, 2024 · 3 comments
Open

About the input on ARCH benchmark #14

dzr1026 opened this issue Nov 13, 2024 · 3 comments

Comments

@dzr1026
Copy link

dzr1026 commented Nov 13, 2024

Thank you for your work, it's a very innovative piece of research.
I have a question regarding the ARCH benchmark results (Table 5): What is the input for these results? Specifically, what is the "semantic representation"? Is it the latent space after RVQ (Residual Vector Quantization)? Or is the semantic representation the sum of the latent spaces from all eight quantizers?

@zhenye234
Copy link
Owner

Quantized semantic feature, here

o_semantic = self.decoder_semantic(quantized_semantic )

@dzr1026
Copy link
Author

dzr1026 commented Nov 14, 2024

Thank you for your reply!

@ggiggit
Copy link

ggiggit commented Nov 20, 2024

@zhenye234 Thanks for your previous response! I have a couple more questions about Table 5, if you don't mind:

  1. Could you please clarify the semantic representations for DAC, Encodec, and the Baseline Acoustic Codec in Table 5?
  2. Also, I'm curious why SpeechTokenizer was excluded from the comparison?

Thanks so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants