Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For models with other architectures, such as Qwen family, how to find the best \alpha, \beta and \sqrt{1/t} parameters? #63

Open
ki-ljl opened this issue Jul 3, 2024 · 0 comments

Comments

@ki-ljl
Copy link

ki-ljl commented Jul 3, 2024

The author mentioned in the paper that for the Llama family, the good values ​​of \alpha and \beta are 1 and 32, but did not mention how to obtain these two parameters. In addition, the author mentioned that \sqrt{1/t} can be fitted by the lowest ppl. Can this part be explained more clearly?

If anyone can answer my question I would appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant