Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
tjadamlee authored Apr 12, 2023
1 parent 6eaf43e commit 533385e
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

如果已经登录Huggingface:[直接下载](https://huggingface.co/BelleGroup/ChatBELLE-int4/resolve/main/belle-model.bin)

移动并重命名模型至app显示的路径。默认为~/Library/Containers/com.barius.chatbelle/Data/belle-model.bin。

## 模型量化
使用[llama.cpp的4bit量化](https://github.com/ggerganov/llama.cpp)优化设备端离线推理的速度和内存占用。量化会带来计算精度的损失,影响模型的生成效果。4bit是比较激进的量化方式,目前的4bit模型效果相比fp32和fp16还有明显差距,仅供尝试。随着模型算法的发展和设备端算力的演进,我们相信离线推理的效果会有很大改善,我们也会持续跟进。
Expand Down

0 comments on commit 533385e

Please sign in to comment.