Request for Qwen 7B support #147

onestepbackk · 2024-12-24T08:57:52Z

Hi I have been experimenting with Qwen2.5 1.5B model and it is working without an issue.
But regarding Qwen2.5 7B model, I can convert the huggingface model to rkllm model, but when I try to use it, it doesn't do anything.
It seems like it properly loads the model to the memory, but if I input some prompt, the NPU load is at 0%, and gives me blank as an answer.

I am using rkllm-runtime version: 1.1.4, rknpu driver version: 0.9.8, platform: RK3588 (Orangepi 5 pro)

Here is a sample output using llm_demo
user: Hi how are you?
robot:
user: answer me
robot:
user:

yuguolong · 2024-12-25T06:21:26Z

I also had problems loading the model.

platform

rkllm-runtime version: 1.1.4, rknpu driver version: 0.9.7, platform: RK3576

model

Qwen2.5-7B format：w8a8

Error message：

The model can be run in w4a16 format, but cannot be loaded in w8a8 state.

imkebe · 2024-12-25T13:48:49Z

RK3576 AFAIK doesn't support w8a8. what you expect ?

c0zaut · 2024-12-25T16:40:20Z

@yuguolong - all of my models are for RK3588 only. You will need to run a conversion yourself with the rk3576 platform and w4a16 quantization. You can use my interactive pipeline to walk through it, as long as the original safetensors model is in Huggingface.

yuguolong · 2024-12-26T08:47:26Z

@imkebe It does support, I have run qwen2.5-3B w8a8 on rk3567.

yuguolong · 2024-12-26T08:52:49Z

@c0zaut Models are converted for rk3576 chip. I tested Qwen2.5-3B, Qwen2.5-7B and other models. Qwen2.5-3B-w8a8 can run, only Qwen2.5-7B-w8a8 can not run.

onestepbackk · 2024-12-26T09:57:42Z

I am using rkllm-runtime version: 1.1.4, rknpu driver version: 0.9.8 with RK3588 and not able to run both Qwen2.5-3B w8a8 and Qwen2.5-7B w8a8

yuguolong · 2024-12-27T02:17:17Z

@onestepbackk
Is the model loading OK?

c0zaut · 2024-12-27T02:25:26Z

@onestepbackk - I got similar behavior when trying to use run_async instead of just run. What is your callback function?

onestepbackk · 2024-12-27T08:42:07Z

@yuguolong

@onestepbackk Is the model loading OK?

Yes, the model is loading.

@c0zaut

@onestepbackk - I got similar behavior when trying to use run_async instead of just run. What is your callback function?

same thing happens with the run function.

c0zaut · 2024-12-28T00:02:56Z

@onestepbackk

Can you try to load it here? https://github.com/c0zaut/RKLLM-Gradio

Video tutorial: https://youtu.be/sTHNZZP0S3E

Models for RK3588: https://huggingface.co/c01zaut/Qwen2.5-7B-Instruct-RK3588-1.1.4

^ there's a bunch of different version, group sizes, hybrid ratios, etc that you can try out.

onestepbackk · 2025-01-08T06:24:33Z

@c0zaut
Awesome project!
I tried your pre-exported model and it works nicely, even with my code.
It seems like it is related to how I export the model.
Can you tell me how you exported the model?
I exported to rkllm by
llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8', quantized_algorithm='normal', target_platform='rk3588', num_npu_core=3, extra_qparams=None)

c0zaut · 2025-01-11T04:35:31Z

@onestepbackk - I need to push my most recent changes to GitHub, since this uses the older version of the toolkit, but: https://github.com/c0zaut/ez-er-rkllm-toolkit

Just change out the whl file and update the Dockerfile and it should work just fine! That's what I have been doing, but I haven't been able to get to the device with this repo in a little while. I'll try to push something this weekend.

The non-interactive container allows you to set a bunch of different settings. I wouldn't recommend setting hybrid_ration to 1.0, though, since that is the same as w8a8 for a groupsize quant; for a w8a8 model, it is just 100% w8a8_g128 (or whatever is compatible.)

TL;DR for groupsize quantization: w8a8 is faster, but less accurate; w8a8_g* is going to be slower, but more accurate. If you optimize, I would recommend using a small sample of your own dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Qwen 7B support #147

Request for Qwen 7B support #147

onestepbackk commented Dec 24, 2024 •

edited

Loading

yuguolong commented Dec 25, 2024

imkebe commented Dec 25, 2024

c0zaut commented Dec 25, 2024

yuguolong commented Dec 26, 2024

yuguolong commented Dec 26, 2024

onestepbackk commented Dec 26, 2024

yuguolong commented Dec 27, 2024

c0zaut commented Dec 27, 2024

onestepbackk commented Dec 27, 2024 •

edited

Loading

c0zaut commented Dec 28, 2024 •

edited

Loading

onestepbackk commented Jan 8, 2025 •

edited

Loading

c0zaut commented Jan 11, 2025 •

edited

Loading

Request for Qwen 7B support #147

Request for Qwen 7B support #147

Comments

onestepbackk commented Dec 24, 2024 • edited Loading

yuguolong commented Dec 25, 2024

platform

model

Error message：

imkebe commented Dec 25, 2024

c0zaut commented Dec 25, 2024

yuguolong commented Dec 26, 2024

yuguolong commented Dec 26, 2024

onestepbackk commented Dec 26, 2024

yuguolong commented Dec 27, 2024

c0zaut commented Dec 27, 2024

onestepbackk commented Dec 27, 2024 • edited Loading

c0zaut commented Dec 28, 2024 • edited Loading

onestepbackk commented Jan 8, 2025 • edited Loading

c0zaut commented Jan 11, 2025 • edited Loading

onestepbackk commented Dec 24, 2024 •

edited

Loading

onestepbackk commented Dec 27, 2024 •

edited

Loading

c0zaut commented Dec 28, 2024 •

edited

Loading

onestepbackk commented Jan 8, 2025 •

edited

Loading

c0zaut commented Jan 11, 2025 •

edited

Loading