Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the optimized rotation matrix for Llama3-70B #11

Open
lsjlsj5846 opened this issue Sep 9, 2024 · 6 comments
Open

Question about the optimized rotation matrix for Llama3-70B #11

lsjlsj5846 opened this issue Sep 9, 2024 · 6 comments

Comments

@lsjlsj5846
Copy link

Hello,

I tried to reproduce the results of the paper, and got similar results for Llama2-7B, 13B, 70B, and Llama-3 8B.
However, when I tested Llama3-70B using the optimized rotation matrix you provided [link], the result of RTN was as follows:

Wikitext-2 PPL paper-reported Mine diff.
Llama3-70B 4.1 7.5821 3.4821

I also found out that GPTQ results of Llama3-70B differ from what you reported. (I used W4A4KV4 rotation matrix for RTN, and W16A4KV4 rotation matrix for GPTQ.)
I guess the provided rotation matrices for Llama3-70B is somehow wrong. Could you check this issue, and provide the right rotation matrix for Llama3-70B if possible?

Thank you.

@ChenMnZ
Copy link

ChenMnZ commented Sep 9, 2024

Hi, @lsjlsj5846
Have you successfully reproduce dthe results when take GPTQ as weight quantizer?

I also successfully get similar results with paper for Llama2-7B, 13B, 70B, and Llama-3 8B when take RTN as the weight quantizer.

However, the GPTQ results I obtained even worse than RTN.

@lsjlsj5846
Copy link
Author

Hi, @ChenMnZ
Yes, I got GPTQ results similar to the paper, except for Llama3-70B.
Did you use W16A4KV4 rotation matrices?

@ChenMnZ
Copy link

ChenMnZ commented Sep 9, 2024

@lsjlsj5846 I used the W4A4KV4 pretrained rotation matrices before.(https://drive.google.com/drive/folders/1R2zix4qeXBjcmgnJN1rny93cguJ4rEE8?usp=sharing).

Thanks for your reminder, I will give a try with W16A4KV4 rotation matrix.

@ChenMnZ
Copy link

ChenMnZ commented Sep 9, 2024

@lsjlsj5846 I meet the same problem with RTN Llama3-70B W4A4KV4.

@cokeshao
Copy link

cokeshao commented Sep 26, 2024

Hi, @ChenMnZ
I also got GPTQ results that were different from the paper.

./scripts/2_eval_ptq.sh meta-llama/Llama-2-7b-hf 4 4 4

I also used the W16A4KV4 rotation matrix that was given. google drive

Here's what I reproduced.

Task Version Metric Value   Stderr In paper
arc_easy 0 acc 0.6540 ± 0.0098 72.6
    acc_norm 0.5198 ± 0.0103  
arc_challenge 0 acc 0.3703 ± 0.0141 47.5
    acc_norm 0.3891 ± 0.0142  

There is a big difference. I think the good results on Wikitext are likely to be overfitting on Wikitext🤔.

Have you encountered the same problem as me? I look forward to discussing it with you.
Thank you.

@JingyangXiang
Copy link

JingyangXiang commented Nov 19, 2024

Hi, @ChenMnZ I also got GPTQ results that were different from the paper.

./scripts/2_eval_ptq.sh meta-llama/Llama-2-7b-hf 4 4 4

I also used the W16A4KV4 rotation matrix that was given. google drive

Here's what I reproduced.

Task Version Metric Value   Stderr In paper
arc_easy 0 acc 0.6540 ± 0.0098 72.6
    acc_norm 0.5198 ± 0.0103  
arc_challenge 0 acc 0.3703 ± 0.0141 47.5
    acc_norm 0.3891 ± 0.0142  
There is a big difference. I think the good results on Wikitext are likely to be overfitting on Wikitext🤔.

Have you encountered the same problem as me? I look forward to discussing it with you. Thank you.

I also agree with this overfitting. Maybe SpinQuant is more like to LoRA, which tries to fitting downstream tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants