Skip to content

Commit

Permalink
[Quantization] Update quantized model deployment examples and update …
Browse files Browse the repository at this point in the history
…readme. (PaddlePaddle#377)

* Add PaddleOCR Support

* Add PaddleOCR Support

* Add PaddleOCRv3 Support

* Add PaddleOCRv3 Support

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Add PaddleOCRv3 Support

* Add PaddleOCRv3 Supports

* Add PaddleOCRv3 Suport

* Fix Rec diff

* Remove useless functions

* Remove useless comments

* Add PaddleOCRv2 Support

* Add PaddleOCRv3 & PaddleOCRv2 Support

* remove useless parameters

* Add utils of sorting det boxes

* Fix code naming convention

* Fix code naming convention

* Fix code naming convention

* Fix bug in the Classify process

* Imporve OCR Readme

* Fix diff in Cls model

* Update Model Download Link in Readme

* Fix diff in PPOCRv2

* Improve OCR readme

* Imporve OCR readme

* Improve OCR readme

* Improve OCR readme

* Imporve OCR readme

* Improve OCR readme

* Fix conflict

* Add readme for OCRResult

* Improve OCR readme

* Add OCRResult readme

* Improve OCR readme

* Improve OCR readme

* Add Model Quantization Demo

* Fix Model Quantization Readme

* Fix Model Quantization Readme

* Add the function to do PTQ quantization

* Improve quant tools readme

* Improve quant tool readme

* Improve quant tool readme

* Add PaddleInference-GPU for OCR Rec model

* Add QAT method to fastdeploy-quantization tool

* Remove examples/slim for now

* Move configs folder

* Add Quantization Support for Classification Model

* Imporve ways of importing preprocess

* Upload YOLO Benchmark on readme

* Upload YOLO Benchmark on readme

* Upload YOLO Benchmark on readme

* Improve Quantization configs and readme

* Add support for multi-inputs model

* Add backends and params file for YOLOv7

* Add quantized model deployment support for YOLO series

* Fix YOLOv5 quantize readme

* Fix YOLO quantize readme

* Fix YOLO quantize readme

* Improve quantize YOLO readme

* Improve quantize YOLO readme

* Improve quantize YOLO readme

* Improve quantize YOLO readme

* Improve quantize YOLO readme

* Fix bug, change Fronted to ModelFormat

* Change Fronted to ModelFormat

* Add examples to deploy quantized paddleclas models

* Fix readme

* Add quantize Readme

* Add quantize Readme

* Add quantize Readme

* Modify readme of quantization tools

* Modify readme of quantization tools

* Improve quantization tools readme

* Improve quantization readme

* Improve PaddleClas quantized model deployment  readme

* Add PPYOLOE-l quantized deployment examples

* Improve quantization tools readme

* Improve Quantize Readme

* Fix conflicts

* Fix conflicts

* improve readme

* Improve quantization tools and readme

* Improve quantization tools and readme

* Add quantized deployment examples for PaddleSeg model

* Fix cpp readme

* Fix memory leak of reader_wrapper function

* Fix model file name in PaddleClas quantization examples

* Update Runtime and E2E benchmark

* Update Runtime and E2E benchmark

* Rename quantization tools to auto compression tools

* Remove PPYOLOE data when deployed on MKLDNN

* Fix readme

* Support PPYOLOE with OR without NMS and update readme

* Update Readme

* Update configs and readme

* Update configs and readme

* Add Paddle-TensorRT backend in quantized model deploy examples

* Support PPYOLOE+ series
  • Loading branch information
yunyaoXYY authored Nov 2, 2022
1 parent 9437dec commit a231c9e
Show file tree
Hide file tree
Showing 53 changed files with 1,519 additions and 526 deletions.
149 changes: 110 additions & 39 deletions docs/cn/quantize.md

Large diffs are not rendered by default.

74 changes: 71 additions & 3 deletions docs/en/quantize.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,79 @@
[English](../en/quantize.md) | 简体中文

# 量化加速
量化是一种流行的模型压缩方法,量化后的模型拥有更小的体积和更快的推理速度.
FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDeploy支持部署量化后的模型, 帮助用户实现推理加速.

简要介绍量化加速的原理。

目前量化支持在哪些硬件及后端的使用
## FastDeploy 多个引擎和硬件支持量化模型部署
当前,FastDeploy中多个推理后端可以在不同硬件上支持量化模型的部署. 支持情况如下:

| 硬件/推理后端 | ONNX Runtime | Paddle Inference | TensorRT |
| :-----------| :-------- | :--------------- | :------- |
| CPU | 支持 | 支持 | |
| GPU | | | 支持 |


## 模型量化

### 量化方法
基于PaddleSlim, 目前FastDeploy提供的的量化方法有量化蒸馏训练和离线量化, 量化蒸馏训练通过模型训练来获得量化模型, 离线量化不需要模型训练即可完成模型的量化. FastDeploy 对两种方式产出的量化模型均能部署.

两种方法的主要对比如下表所示:
| 量化方法 | 量化过程耗时 | 量化模型精度 | 模型体积 | 推理速度 |
| :-----------| :--------| :-------| :------- | :------- |
| 离线量化 | 无需训练,耗时短 | 比量化蒸馏训练稍低 | 两者一致 | 两者一致 |
| 量化蒸馏训练 | 需要训练,耗时稍高 | 较未量化模型有少量损失 | 两者一致 |两者一致 |

### 用户使用FastDeploy一键模型量化工具来量化模型
Fastdeploy基于PaddleSlim, 为用户提供了一键模型量化的工具,请参考如下文档进行模型量化.
- [FastDeploy 一键模型量化](../../tools/quantization/)
当用户获得产出的量化模型之后,即可以使用FastDeploy来部署量化模型.


## 量化示例
目前, FastDeploy已支持的模型量化如下表所示:

### YOLO 系列
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP | 量化方式 |
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | TensorRT | GPU | 8.79 | 5.17 | 1.70 | 37.6 | 36.6 | 量化蒸馏训练 |
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | ONNX Runtime | CPU | 176.34 | 92.95 | 1.90 | 37.6 | 33.1 |量化蒸馏训练 |
| [YOLOv5s](../../examples/vision/detection/yolov5/quantize/) | Paddle Inference | CPU | 217.05 | 133.31 | 1.63 |37.6 | 36.8 | 量化蒸馏训练 |
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | TensorRT | GPU | 8.60 | 5.16 | 1.67 | 42.5 | 40.6|量化蒸馏训练 |
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | ONNX Runtime | CPU | 338.60 | 128.58 | 2.60 |42.5| 36.1|量化蒸馏训练 |
| [YOLOv6s](../../examples/vision/detection/yolov6/quantize/) | Paddle Inference | CPU | 356.62 | 125.72 | 2.84 |42.5| 41.2|量化蒸馏训练 |
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | TensorRT | GPU | 24.57 | 9.40 | 2.61 | 51.1| 50.8|量化蒸馏训练 |
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | ONNX Runtime | CPU | 976.88 | 462.69 | 2.11 | 51.1 | 42.5|量化蒸馏训练 |
| [YOLOv7](../../examples/vision/detection/yolov7/quantize/) | Paddle Inference | CPU | 1022.55 | 490.87 | 2.08 |51.1 | 46.3|量化蒸馏训练 |

上表中的数据, 为模型量化前后,在FastDeploy部署的Runtime推理性能.
- 测试数据为COCO2017验证集中的图片.
- 推理时延为在不同Runtime上推理的时延, 单位是毫秒.
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.


### PaddleDetection系列
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 mAP | INT8 mAP |量化方式 |
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
| [ppyoloe_crn_l_300e_coco](../../examples/vision/detection/paddledetection/quantize ) | TensorRT | GPU | 24.52 | 11.53 | 2.13 | 51.4 | 50.7 | 量化蒸馏训练 |
| [ppyoloe_crn_l_300e_coco](../../examples/vision/detection/paddledetection/quantize) | ONNX Runtime | CPU | 1085.62 | 457.56 | 2.37 |51.4 | 50.0 |量化蒸馏训练 |

上表中的数据, 为模型量化前后,在FastDeploy部署的Runtime推理性能.
- 测试图片为COCO val2017中的图片.
- 推理时延为在不同Runtime上推理的时延, 单位是毫秒.
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.


### PaddleClas系列
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 Top1 | INT8 Top1 |量化方式 |
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 77.20 | 40.08 | 1.93 | 79.12 | 78.87| 离线量化|
| [ResNet50_vd](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 3.70 | 1.80 | 2.06 | 79.12 | 79.06 | 离线量化 |
| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | ONNX Runtime | CPU | 30.99 | 10.24 | 3.03 |77.89 | 75.09 |离线量化 |
| [MobileNetV1_ssld](../../examples/vision/classification/paddleclas/quantize/) | TensorRT | GPU | 1.80 | 0.58 | 3.10 |77.89 | 76.86 | 离线量化 |

这里一个表格,展示目前支持的量化列表(跳转到相应的example下去),精度、性能
上表中的数据, 为模型量化前后,在FastDeploy部署的Runtime推理性能.
- 测试数据为ImageNet-2012验证集中的图片.
- 推理时延为在不同Runtime上推理的时延, 单位是毫秒.
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
53 changes: 38 additions & 15 deletions examples/vision/classification/paddleclas/quantize/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,48 @@
# PaddleClas 量化模型部署
FastDeploy已支持部署量化模型,并提供一键模型量化的工具.
用户可以使用一键模型量化工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.
FastDeploy已支持部署量化模型,并提供一键模型自动化压缩的工具.
用户可以使用一键模型自动化压缩工具,自行对模型量化后部署, 也可以直接下载FastDeploy提供的量化模型进行部署.

## FastDeploy一键模型量化工具
FastDeploy 提供了一键量化工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
详细教程请见: [一键模型量化工具](../../../../../tools/quantization/)
## FastDeploy一键模型自动化压缩工具
FastDeploy 提供了一键模型自动化压缩工具, 能够简单地通过输入一个配置文件, 对模型进行量化.
详细教程请见: [一键模型自动化压缩工具](../../../../../tools/auto_compression/)
注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可。

## 下载量化完成的PaddleClas模型
用户也可以直接下载下表中的量化模型进行部署.
| 模型 |推理后端 |部署硬件 | FP32推理时延 | INT8推理时延 | 加速比 | FP32 Top1 | INT8 Top1 |量化方式 |
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | ONNX Runtime | CPU | 86.87 | 59 .32 | 1.46 | 79.12 | 78.87| 离线量化|
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | TensorRT | GPU | 7.85 | 5.42 | 1.45 | 79.12 | 79.06 | 离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | ONNX Runtime | CPU | 40.32 | 16.87 | 2.39 |77.89 | 75.09 |离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | TensorRT | GPU | 5.10 | 3.35 | 1.52 |77.89 | 76.86 | 离线量化 |

上表中的数据, 为模型量化前后,在FastDeploy部署的端到端推理性能.
- 测试图片为ImageNet-2012验证集中的图片.
- 推理时延为端到端推理(包含前后处理)的平均时延, 单位是毫秒.
- CPU为Intel(R) Xeon(R) Gold 6271C, GPU为Tesla T4, TensorRT版本8.4.15, 所有测试中固定CPU线程数为1.
Benchmark表格说明:
- Rtuntime时延为模型在各种Runtime上的推理时延,包含CPU->GPU数据拷贝,GPU推理,GPU->CPU数据拷贝时间. 不包含模型各自的前后处理时间.
- 端到端时延为模型在实际推理场景中的时延, 包含模型的前后处理.
- 所测时延均为推理1000次后求得的平均值, 单位是毫秒.
- INT8 + FP16 为在推理INT8量化模型的同时, 给Runtime 开启FP16推理选项
- INT8 + FP16 + PM, 为在推理INT8量化模型和开启FP16的同时, 开启使用Pinned Memory的选项,可加速GPU->CPU数据拷贝的速度
- 最大加速比, 为FP32时延除以INT8推理的最快时延,得到最大加速比.
- 策略为量化蒸馏训练时, 采用少量无标签数据集训练得到量化模型, 并在全量验证集上验证精度, INT8精度并不代表最高的INT8精度.
- CPU为Intel(R) Xeon(R) Gold 6271C, 所有测试中固定CPU线程数为1. GPU为Tesla T4, TensorRT版本8.4.15.

### Runtime Benchmark
| 模型 |推理后端 |部署硬件 | FP32 Runtime时延 | INT8 Runtime时延 | INT8 + FP16 Runtime时延 | INT8+FP16+PM Runtime时延 | 最大加速比 | FP32 Top1 | INT8 Top1 | 量化方式 |
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |----- |----- |
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | TensorRT | GPU | 3.55 | 0.99|0.98|1.06 | 3.62 | 79.12 | 79.06 | 离线量化 |
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | Paddle-TensorRT | GPU | 3.46 |None |0.87|1.03 | 3.98 | 79.12 | 79.06 | 离线量化 |
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | ONNX Runtime | CPU | 76.14 | 35.43 |None|None | 2.15 | 79.12 | 78.87| 离线量化|
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | Paddle Inference | CPU | 76.21 | 24.01 |None|None | 3.17 | 79.12 | 78.55 | 离线量化|
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | TensorRT | GPU | 0.91 | 0.43 |0.49 | 0.54 | 2.12 |77.89 | 76.86 | 离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | Paddle-TensorRT | GPU | 0.88| None| 0.49|0.51 | 1.80 |77.89 | 76.86 | 离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | ONNX Runtime | CPU | 30.53 | 9.59|None|None | 3.18 |77.89 | 75.09 |离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | Paddle Inference | CPU | 12.29 | 4.68 | None|None|2.62 |77.89 | 71.36 |离线量化 |

### 端到端 Benchmark
| 模型 |推理后端 |部署硬件 | FP32 Runtime时延 | INT8 Runtime时延 | INT8 + FP16 Runtime时延 | INT8+FP16+PM Runtime时延 | 最大加速比 | FP32 Top1 | INT8 Top1 | 量化方式 |
| ------------------- | -----------------|-----------| -------- |-------- |-------- | --------- |-------- |----- |----- |----- |
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | TensorRT | GPU | 4.92| 2.28|2.24|2.23 | 2.21 | 79.12 | 79.06 | 离线量化 |
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | Paddle-TensorRT | GPU | 4.48|None |2.09|2.10 | 2.14 | 79.12 | 79.06 | 离线量化 |
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | ONNX Runtime | CPU | 77.43 | 41.90 |None|None | 1.85 | 79.12 | 78.87| 离线量化|
| [ResNet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar) | Paddle Inference | CPU | 80.60 | 27.75 |None|None | 2.90 | 79.12 | 78.55 | 离线量化|
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | TensorRT | GPU | 2.19 | 1.48|1.57| 1.57 | 1.48 |77.89 | 76.86 | 离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | Paddle-TensorRT | GPU | 2.04| None| 1.47|1.45 | 1.41 |77.89 | 76.86 | 离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | ONNX Runtime | CPU | 34.02 | 12.97|None|None | 2.62 |77.89 | 75.09 |离线量化 |
| [MobileNetV1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/mobilenetv1_ssld_ptq.tar) | Paddle Inference | CPU | 16.31 | 7.42 | None|None| 2.20 |77.89 | 71.36 |离线量化 |

## 详细部署文档

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PaddleClas 量化模型 Python部署示例
# PaddleClas 量化模型 C++部署示例
本目录下提供的`infer.cc`,可以帮助用户快速完成PaddleClas量化模型在CPU/GPU上的部署推理加速.

## 部署准备
Expand All @@ -8,7 +8,7 @@

### 量化模型准备
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
- 2. 用户可以使用FastDeploy提供的[一键模型自动化压缩工具](../../../../../tools/auto_compression/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)

## 以量化后的ResNet50_Vd模型为例, 进行部署
在本目录执行如下命令即可完成编译,以及量化模型部署.
Expand All @@ -26,8 +26,10 @@ tar -xvf resnet50_vd_ptq.tar
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg


# 在CPU上使用Paddle-Inference推理量化模型
# 在CPU上使用ONNX Runtime推理量化模型
./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 0
# 在GPU上使用TensorRT推理量化模型
./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 1
# 在GPU上使用Paddle-TensorRT推理量化模型
./infer_demo resnet50_vd_ptq ILSVRC2012_val_00000010.jpeg 2
```
10 changes: 7 additions & 3 deletions examples/vision/classification/paddleclas/quantize/cpp/infer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ const char sep = '/';

void InitAndInfer(const std::string& model_dir, const std::string& image_file,
const fastdeploy::RuntimeOption& option) {
auto model_file = model_dir + sep + "inference.pdmodel";
auto params_file = model_dir + sep + "inference.pdiparams";
auto model_file = model_dir + sep + "model.pdmodel";
auto params_file = model_dir + sep + "model.pdiparams";
auto config_file = model_dir + sep + "inference_cls.yaml";

auto model = fastdeploy::vision::classification::PaddleClasModel(
Expand Down Expand Up @@ -67,7 +67,11 @@ int main(int argc, char* argv[]) {
option.UseGpu();
option.UseTrtBackend();
option.SetTrtInputShape("inputs",{1, 3, 224, 224});
}
} else if (flag == 2) {
option.UseGpu();
option.UseTrtBackend();
option.EnablePaddleToTrt();
}

std::string model_dir = argv[1];
std::string test_image = argv[2];
Expand Down
10 changes: 10 additions & 0 deletions examples/vision/classification/paddleclas/quantize/cpp/ocr.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
rm -rf build
mkdir build

cd build

#/xieyunyao/project/FastDeploy

cmake .. -DFASTDEPLOY_INSTALL_DIR=/xieyunyao/project/FastDeploy

make -j
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

### 量化模型准备
- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
- 2. 用户可以使用FastDeploy提供的[一键模型量化工具](../../../../../../tools/quantization/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
- 2. 用户可以使用FastDeploy提供的[一键模型自动化压缩工具](../../tools/auto_compression/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的inference_cls.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)


## 以量化后的ResNet50_Vd模型为例, 进行部署
Expand All @@ -22,8 +22,10 @@ wget https://bj.bcebos.com/paddlehub/fastdeploy/resnet50_vd_ptq.tar
tar -xvf resnet50_vd_ptq.tar
wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg

# 在CPU上使用Paddle-Inference推理量化模型
# 在CPU上使用ONNX Runtime推理量化模型
python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device cpu --backend ort
# 在GPU上使用TensorRT推理量化模型
python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device gpu --backend trt
# 在GPU上使用Paddle-TensorRT推理量化模型
python infer.py --model resnet50_vd_ptq --image ILSVRC2012_val_00000010.jpeg --device gpu --backend pptrt
```
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ def build_option(args):
) == "gpu", "TensorRT backend require inferences on device GPU."
option.use_trt_backend()
option.set_trt_input_shape("inputs", min_shape=[1, 3, 224, 224])
elif args.backend.lower() == "pptrt":
assert args.device.lower(
) == "gpu", "TensorRT backend require inference on device GPU."
option.use_trt_backend()
option.enable_paddle_to_trt()
elif args.backend.lower() == "ort":
option.use_ort_backend()
elif args.backend.lower() == "paddle":
Expand Down
Loading

0 comments on commit a231c9e

Please sign in to comment.