Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError:make mixbits quant linear()got an unexpected keyword argument 'device' #112

Closed
bg51717 opened this issue Apr 14, 2024 · 9 comments · Fixed by #113 or #114
Closed

TypeError:make mixbits quant linear()got an unexpected keyword argument 'device' #112

bg51717 opened this issue Apr 14, 2024 · 9 comments · Fixed by #113 or #114

Comments

@bg51717
Copy link

bg51717 commented Apr 14, 2024

when I using
python -m qllm --model=/root/models/baichuan-inc/Baichuan2-7B-Base --method=gptq --nsamples=64 --wbits=4 --groupsize=128 --save /root/models/baichuan-inc/Baichuan2-7B-Base_gptq_4b --export_onnx /root/models/baichuan-inc/Baichuan2-7B-Base_gptq_4b_onnx/,
it raise a error:

Traceback (most recent call last):
File "<frozen runpy>",line 198,in _run_module_as_main
File "<frozen runpy>",line 88,in _run_code
File "/root/QLLM/q11m/__main__py",line 6,in <module>
   main()
File "/root/QLLM/qllm/run.py",line 78,in main
   model_quanter.run(args)
File "/root/QLLM/q11m/auto_model_quantization.py",line 215,in run
model self.pack model(model,quantizers,args.pack mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/QLLM/q11m/auto_model_quantization.py",line 80,in pack_model
   make_mixbits_quant_linear(model,quantizers,quant_config_by_layer,target_layer=target_layer,device="cp
u")
TypeError:make mixbits quant linear()got an unexpected keyword argument 'device'
@wejoncy wejoncy linked a pull request Apr 15, 2024 that will close this issue
@wejoncy
Copy link
Owner

wejoncy commented Apr 15, 2024

Hi @bg51717 Thanks for reporting this. It's been fixed.

@bg51717
Copy link
Author

bg51717 commented Apr 15, 2024

However, there are still bugs present when saving the model after quantizing it with GPTQ.
My command is

model_name='facebook/opt-350m'
CUDA_VISABLE_DEVICES=0
python -m qllm \
    --model=/home/binguo/project/models/${model_name} \
    --method=gptq \
    --nsamples=64 \
    --wbits=4 \
    --groupsize=128 \
    --save ./${model_name}_gptq4b \
    --export_onnx ./onnx_model/${model_name}_gptq4b

and the error stack is

2024-04-15 11:33:10,114 - qllm - INFO - Finished quantization and packing weight, time cost:171.49598741531372
Traceback (most recent call last):
  File "/home/binguo/.conda/envs/QLLM/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/binguo/.conda/envs/QLLM/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/binguo/project/QLLM/qllm/__main__.py", line 6, in <module>
    main()
  File "/home/binguo/project/QLLM/qllm/run.py", line 78, in main
    model_quanter.run(args)
  File "/home/binguo/project/QLLM/qllm/auto_model_quantization.py", line 220, in run
    AutoQuantizedModelForCausalLM.save_pretrained(model, self.tokenizer, args.save,
  File "/home/binguo/project/QLLM/qllm/modeling/base.py", line 291, in save_pretrained
    model.config.quantization_config = model.quant_config.quant_config
AttributeError: 'GPTQConfig' object has no attribute 'quant_config'

I'm interested in model quantizing and I believe the QLLM is a great project.Thanks for your work!

@wejoncy wejoncy reopened this Apr 15, 2024
@wejoncy wejoncy linked a pull request Apr 15, 2024 that will close this issue
@wejoncy
Copy link
Owner

wejoncy commented Apr 15, 2024

Hi @bg51717
Appologize for the inconvenience during the quantization process one more time.

I have test it locally, it should work. Could you give another shot on it?Thanks

@bg51717
Copy link
Author

bg51717 commented Apr 15, 2024

I have tried the previous commands,and have a new bug.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/binguo/project/QLLM/onnx_model/facebook/opt-350m_gptq4b/decoder_merged.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model.cc:179 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9

I also have tried the solution from microsoft/onnxruntime#20252 I still get the same error.
I know this may not be QLLM's bug.I want to know whether you have this bug.And hope to know your environment.Thanks!

@wejoncy
Copy link
Owner

wejoncy commented Apr 15, 2024

pip install onnx==1.15 will fix it.

@bg51717
Copy link
Author

bg51717 commented Apr 16, 2024

The command executed successfully, but it seems that the final result did not pass. So, has it failed?

max abs err_prefill: 0.03906 max abs err_decode: 0.01563 correctness check is  not  passed   

Besides,when I try to ues the other command :

model_name='facebook/opt-350m'
CUDA_VISABLE_DEVICES=0
python -m qllm \
    --model=/home/binguo/project/models/${model_name} \
    --method=awq \
    --dataset=pileval \
    --nsamples=16 \
    --wbits=4 \
    --groupsize=128 \
    --save ./${model_name}_awq4b \
    --export_onnx ./onnx_model/${model_name}_awq4b

it raise a error:

File "/home/binguo/project/QLLM/qllm/quantization/sequential_layes_awq_config.py", line 629, in auto_detect
_sequential_layers 
       assert model_type in true_sequential_layers_for_model, f"{model_type} is not support"
AssertionError: OPTForCausalLM is not support 

but I found "OptForCausalLM" in true_sequential_layers_for_model rather than 'OPTForCausalLM'.

true_sequential_layers_for_model = dict(
    AquilaForCausalLM=get_aquila_layers,
    BaichuanForCausalLM=get_baichuan_layers,
    BloomForCausalLM=get_bloom_layer,
    FalconForCausalLM=get_falcon_layers,
    GptBigCodeForCausalLM=get_bigcode_layers,
    GPTNeoXForCausalLM=get_neox_layers,
    GPTJForCausalLM=get_gptj_layers,
    LlamaForCausalLM=get_llama_layers,
    LlavaForCausalLM =get_llava_layers,
    MistralForCausalLM=get_mistral_layers,
    MixtralForCausalLM =get_mixtral_layers,
    MptForCausalLM =get_mpt_layers,
    OptForCausalLM =get_opt_layers,
    QwenForCausalLM=get_qwen_layers,
    YiForCausalLM=get_yi_layers, 
)

And I want to know how to write the function like get_baichuan_layers to extend functionality.Thanks!

@wejoncy
Copy link
Owner

wejoncy commented Apr 16, 2024

  • 0.03906 is tolerable basicly. It shall produce the same output text with pytorch model.
  • Thanks for catching this, OptForCausalLM is indeed a typo of OPTForCausalLM

If you want to support a new model, Please read the original AWQ paper for more detail.

@bg51717
Copy link
Author

bg51717 commented Apr 20, 2024

Hi,@wejoncy . Thanks to your great work.I'm studying model quantization from this project. I would like to know if this project is currently complete? This is because I noticed there are 'todo' placeholders in the code and some discrepancies between function definitions and their usage.How complete are GPTQ, AWQ, and HQQ?

@wejoncy
Copy link
Owner

wejoncy commented Apr 22, 2024

Yeah, It's almost done in quantization functionality. Some TODOs are for code-clean/refactor.
My next plan would be to support other quantization algorithms if avaiable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants