forked from PaddlePaddle/FastDeploy
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Quantization] Update quantized model deployment examples and update …
…readme. (PaddlePaddle#377) * Add PaddleOCR Support * Add PaddleOCR Support * Add PaddleOCRv3 Support * Add PaddleOCRv3 Support * Update README.md * Update README.md * Update README.md * Update README.md * Add PaddleOCRv3 Support * Add PaddleOCRv3 Supports * Add PaddleOCRv3 Suport * Fix Rec diff * Remove useless functions * Remove useless comments * Add PaddleOCRv2 Support * Add PaddleOCRv3 & PaddleOCRv2 Support * remove useless parameters * Add utils of sorting det boxes * Fix code naming convention * Fix code naming convention * Fix code naming convention * Fix bug in the Classify process * Imporve OCR Readme * Fix diff in Cls model * Update Model Download Link in Readme * Fix diff in PPOCRv2 * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Fix conflict * Add readme for OCRResult * Improve OCR readme * Add OCRResult readme * Improve OCR readme * Improve OCR readme * Add Model Quantization Demo * Fix Model Quantization Readme * Fix Model Quantization Readme * Add the function to do PTQ quantization * Improve quant tools readme * Improve quant tool readme * Improve quant tool readme * Add PaddleInference-GPU for OCR Rec model * Add QAT method to fastdeploy-quantization tool * Remove examples/slim for now * Move configs folder * Add Quantization Support for Classification Model * Imporve ways of importing preprocess * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Improve Quantization configs and readme * Add support for multi-inputs model * Add backends and params file for YOLOv7 * Add quantized model deployment support for YOLO series * Fix YOLOv5 quantize readme * Fix YOLO quantize readme * Fix YOLO quantize readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Fix bug, change Fronted to ModelFormat * Change Fronted to ModelFormat * Add examples to deploy quantized paddleclas models * Fix readme * Add quantize Readme * Add quantize Readme * Add quantize Readme * Modify readme of quantization tools * Modify readme of quantization tools * Improve quantization tools readme * Improve quantization readme * Improve PaddleClas quantized model deployment readme * Add PPYOLOE-l quantized deployment examples * Improve quantization tools readme * Improve Quantize Readme * Fix conflicts * Fix conflicts * improve readme * Improve quantization tools and readme * Improve quantization tools and readme * Add quantized deployment examples for PaddleSeg model * Fix cpp readme * Fix memory leak of reader_wrapper function * Fix model file name in PaddleClas quantization examples * Update Runtime and E2E benchmark * Update Runtime and E2E benchmark * Rename quantization tools to auto compression tools * Remove PPYOLOE data when deployed on MKLDNN * Fix readme * Support PPYOLOE with OR without NMS and update readme * Update Readme * Update configs and readme * Update configs and readme * Add Paddle-TensorRT backend in quantized model deploy examples * Support PPYOLOE+ series
- Loading branch information
Showing
53 changed files
with
1,519 additions
and
526 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,79 @@ | ||
[English](../en/quantize.md) | 简体中文 | ||
|
||
# 量化加速 | ||
量化是一种流行的模型压缩方法,量化后的模型拥有更小的体积和更快的推理速度. | ||
FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDeploy支持部署量化后的模型, 帮助用户实现推理加速. | ||
|
||
简要介绍量化加速的原理。 | ||
|
||
目前量化支持在哪些硬件及后端的使用 | ||
## FastDeploy 多个引擎和硬件支持量化模型部署 | ||
当前,FastDeploy中多个推理后端可以在不同硬件上支持量化模型的部署. 支持情况如下: | ||
|
||
| 硬件/推理后端 | ONNX Runtime | Paddle Inference | TensorRT | | ||
| :-----------| :-------- | :--------------- | :------- | | ||
| CPU | 支持 | 支持 | | | ||
| GPU | | | 支持 | | ||
|
||
|
||
## 模型量化 | ||
|
||
### 量化方法 | ||
基于PaddleSlim, 目前FastDeploy提供的的量化方法有量化蒸馏训练和离线量化, 量化蒸馏训练通过模型训练来获得量化模型, 离线量化不需要模型训练即可完成模型的量化. FastDeploy 对两种方式产出的量化模型均能部署. | ||
|
||
两种方法的主要对比如下表所示: | ||
| 量化方法 | 量化过程耗时 | 量化模型精度 | 模型体积 | 推理速度 | | ||
| :-----------| :--------| :-------| :------- | :------- | | ||
| 离线量化 | 无需训练,耗时短 | 比量化蒸馏训练稍低 | 两者一致 | 两者一致 | | ||
| 量化蒸馏训练 | 需要训练,耗时稍高 | 较未量化模型有少量损失 | 两者一致 |两者一致 | | ||
|
||
### 用户使用FastDeploy一键模型量化工具来量化模型 | ||
Fastdeploy基于PaddleSlim, 为用户提供了一键模型量化的工具,请参考如下文档进行模型量化. | ||
- [FastDeploy 一键模型量化](../../tools/quantization/) | ||
当用户获得产出的量化模型之后,即可以使用FastDeploy来部署量化模型. | ||
|
||
|
||
## 量化示例 | ||
目前, FastDeploy已支持的模型量化如下表所示: | ||
|
||
### YOLO 系列 | ||
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 | | ||
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | | ||
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | TensorRT | GPU | 8.79 | 5.17 | 1.70 | 37.6 | 36.6 | 量化蒸馏训练 | | ||
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | ONNX Runtime | CPU | 176.34 | 92.95 | 1.90 | 37.6 | 33.1 |量化蒸馏训练 | | ||
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | Paddle Inference | CPU | 217.05 | 133.31 | 1.63 |37.6 | 36.8 | 量化蒸馏训练 | | ||
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | TensorRT | GPU | 8.60 | 5.16 | 1.67 | 42.5 | 40.6|量化蒸馏训练 | | ||
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | ONNX Runtime | CPU | 338.60 | 128.58 | 2.60 |42.5| 36.1|量化蒸馏训练 | | ||
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | Paddle Inference | CPU | 356.62 | 125.72 | 2.84 |42.5| 41.2|量化蒸馏训练 | | ||
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | TensorRT | GPU | 24.57 | 9.40 | 2.61 | 51.1| 50.8|量化蒸馏训练 | | ||
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | ONNX Runtime | CPU | 976.88 | 462.69 | 2.11 | 51.1 | 42.5|量化蒸馏训练 | | ||
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | Paddle Inference | CPU | 1022.55 | 490.87 | 2.08 |51.1 | 46.3|量化蒸馏训练 | | ||
|
||
上表中的数据, 为模型量化前后,在FastDeploy部署的Runtime推理性能. | ||
- 测试数据为COCO2017验证集中的图片. | ||
- 推理时延为在不同Runtime上推理的时延, 单位是毫秒. | ||
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. | ||
|
||
|
||
### PaddleDetection系列 | ||
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP |量化方式 | | ||
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | | ||
| [ppyoloe_crn_l_300e_coco](../../examples/vision/detection/paddledetection/quantize ) | TensorRT | GPU | 24.52 | 11.53 | 2.13 | 51.4 | 50.7 | 量化蒸馏训练 | | ||
| [ppyoloe_crn_l_300e_coco](../../examples/vision/detection/paddledetection/quantize) | ONNX Runtime | CPU | 1085.62 | 457.56 | 2.37 |51.4 | 50.0 |量化蒸馏训练 | | ||
|
||
上表中的数据, 为模型量化前后,在FastDeploy部署的Runtime推理性能. | ||
- 测试图片为COCO val2017中的图片. | ||
- 推理时延为在不同Runtime上推理的时延, 单位是毫秒. | ||
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. | ||
|
||
|
||
### PaddleClas系列 | ||
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 Top1 | INT8 Top1 |量化方式 | | ||
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- | | ||
| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 77.20 | 40.08 | 1.93 | 79.12 | 78.87| 离线量化| | ||
| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 3.70 | 1.80 | 2.06 | 79.12 | 79.06 | 离线量化 | | ||
| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 30.99 | 10.24 | 3.03 |77.89 | 75.09 |离线量化 | | ||
| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 1.80 | 0.58 | 3.10 |77.89 | 76.86 | 离线量化 | | ||
|
||
这里一个表格,展示目前支持的量化列表(跳转到相应的example下去),精度、性能 | ||
上表中的数据, 为模型量化前后,在FastDeploy部署的Runtime推理性能. | ||
- 测试数据为ImageNet-2012验证集中的图片. | ||
- 推理时延为在不同Runtime上推理的时延, 单位是毫秒. | ||
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1. |
53 changes: 38 additions & 15 deletions
53
examples/vision/classification/paddleclas/quantize/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletions
10
examples/vision/classification/paddleclas/quantize/cpp/ocr.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
rm -rf build | ||
mkdir build | ||
|
||
cd build | ||
|
||
#/xieyunyao/project/FastDeploy | ||
|
||
cmake .. -DFASTDEPLOY_INSTALL_DIR=/xieyunyao/project/FastDeploy | ||
|
||
make -j |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.