From e4b1581593e4368100306abf002fb53480239bed Mon Sep 17 00:00:00 2001
From: huangjianhui <852142024@qq.com>
Date: Thu, 15 Dec 2022 14:53:44 +0800
Subject: [PATCH] [Doc] Update multi_thread docs in tutorials (#886)

* Refactor PaddleSeg with preprocessor && postprocessor

* Fix bugs

* Delete redundancy code

* Modify by comments

* Refactor according to comments

* Add batch evaluation

* Add single test script

* Add ppliteseg single test script && fix eval(raise) error

* fix bug

* Fix evaluation segmentation.py batch predict

* Fix segmentation evaluation bug

* Fix evaluation segmentation bugs

* Update segmentation result docs

* Update old predict api and DisableNormalizeAndPermute

* Update resize segmentation label map with cv::INTER_NEAREST

* Add Model Clone function for PaddleClas && PaddleDet && PaddleSeg

* Add multi thread demo

* Add python model clone function

* Add multi thread python && C++ example

* Fix bug

* Update python && cpp multi_thread examples

* Add cpp && python directory

* Add README.md for examples

* Delete redundant code

* Create README_CN.md

* Rename README_CN.md to README.md

* Update README.md

* Update README.md

Co-authored-by: Jason <jiangjiajun@baidu.com>
---
 tutorials/multi_thread/README.md        | 96 +++++++++++++++++++++++++
 tutorials/multi_thread/cpp/README.md    | 66 ++++-------------
 tutorials/multi_thread/python/README.md | 83 ++++++++-------------
 3 files changed, 136 insertions(+), 109 deletions(-)
 create mode 100644 tutorials/multi_thread/README.md

diff --git a/tutorials/multi_thread/README.md b/tutorials/multi_thread/README.md
new file mode 100644
index 0000000000..00bc7251cc
--- /dev/null
+++ b/tutorials/multi_thread/README.md
@@ -0,0 +1,96 @@
+[English](README.md) | 中文
+
+# FastDeploy模型多线程或多进程预测的使用
+
+FastDeploy针对python和cpp开发者，提供了以下多线程或多进程的示例
+
+- [python多线程以及多进程预测的使用示例](python)
+- [cpp多线程预测的使用示例](cpp)
+
+## 多线程预测时克隆模型
+
+针对一个视觉模型的推理包含3个环节
+- 输入图像，图像经过预处理，最终得到要输入给模型Runtime的Tensor，即preprocess阶段
+- 模型Runtime接收Tensor，进行推理，得到Runtime的输出Tensor，即infer阶段
+- 对Runtime的输出Tensor做后处理，得到最后的结构化信息，如DetectionResult, SegmentationResult等等，即postprocess阶段
+
+针对以上preprocess、infer、postprocess三个阶段，FastDeploy分别抽象出了三个对应的类，即Preprocessor、Runtime、PostProcessor
+
+在多线程调用FastDeploy中的模型进行并行推理的时候，要考虑几个问题
+- Preprocessor、Runtime、Postprocessor三个类能否分别支持并行处理
+- 在支持多线程并发的前提下，能否最大限度的减少内存或显存占用
+
+FastDeploy采用分别拷贝多个对象的方式，进行多线程推理，即每个线程都有一份独立的Preprocessor、Runtime、PostProcessor的实例化的对象。而为了减少内存的占用，对于Runtime的拷贝则采用共享模型权重的方式进行拷贝。因此，虽然复制了多个对象，但对于模型权重和参数在内存或显存中只有一份。
+以此减少拷贝多个对象带来的内存占用。
+
+FastDeploy提供如下接口，来进行模型的clone(以PaddleClas为例)
+
+- Python: `PaddleClasModel.clone()`
+- C++: `PaddleClasModel::Clone()`
+
+
+### Python
+```
+import fastdeploy as fd
+option = fd.RuntimeOption()
+model = fd.vision.classification.PaddleClasModel(model_file, 
+                                                 params_file, 
+                                                 config_file, 
+                                                 runtime_option=option)
+model2 = model.clone()
+im = cv2.imread(image)
+res = model.predict(im)
+```
+
+### C++
+```
+auto model = fastdeploy::vision::classification::PaddleClasModel(model_file, 
+                                                                 params_file, 
+                                                                 config_file, 
+                                                                 option);
+auto model2 = model.Clone();
+auto im = cv::imread(image_file);
+fastdeploy::vision::ClassifyResult res;
+model->Predict(im, &res)
+```
+
+>> **注意**:其他模型类似API接口可查阅[官方C++文档](https://www.paddlepaddle.org.cn/fastdeploy-api-doc/cpp/html/index.html)以及[官方Python文档](https://www.paddlepaddle.org.cn/fastdeploy-api-doc/python/html/index.html)
+
+## Python多线程以及多进程
+
+Python由于语言的限制即GIL锁的存在，在计算密集型的场景下，多线程无法充分利用硬件的性能。因此，Python上提供多进程和多线程两种示例。其异同点如下：
+
+### FastDeploy模型多进程与多线程推理的比较
+
+|     | 资源占用 | 计算密集型 | I/O密集型 | 进程或线程间通信 |
+|:-------|:------|:----------|:----------|:----------|
+| 多进程   | 大 | 快 | 快 | 慢|
+| 多线程   | 小 | 慢 | 较快 |快|
+
+>> **注意**:以上分析相对理论，实际上Python针对不同的计算任务也做出了一定的优化，像是numpy类的计算已经可以做到并行计算，同时由于多进程间的result汇总涉及到进程间通信，而且往往有时候很难鉴别该任务是计算密集型还是I/O密集型，所以一切都需要根据任务进行测试而定。
+
+
+## C++多线程
+
+C++的多线程，兼具了占用资源少，速度快的特点。因此，是使用多线程推理的最佳选择
+
+### C++ 多线程Clone与不Clone内存占用对比
+
+硬件：Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz  
+模型：ResNet50_vd_infer  
+后端：CPU OPENVINO后端推理引擎
+
+单进程内初始化多个模型，内存占用
+| 模型数 | model.Clone()后 | Clone后model->predict()后    | 不Clone模型初始化后| 不Clone后model->predict()后 |
+|:--- |:----- |:----- |:----- |:----- |
+|1|322M |325M |322M|325M|
+|2|322M|325M|559M|560M|
+|3|322M|325M|771M|771M|
+
+模型多线程预测内存占用
+| 线程数 | model.Clone()后 | Clone后model->predict()后    | 不Clone模型初始化后| 不Clone后model->predict()后 |
+|:--- |:----- |:----- |:----- |:----- |
+|1|322M |337M |322M|337M|
+|2|322M|343M|548M|566M|
+|3|322M|347M|752M|784M|
+
diff --git a/tutorials/multi_thread/cpp/README.md b/tutorials/multi_thread/cpp/README.md
index 0663404675..086d71d865 100644
--- a/tutorials/multi_thread/cpp/README.md
+++ b/tutorials/multi_thread/cpp/README.md
@@ -1,11 +1,11 @@
-# PaddleClas C++部署示例
+# PaddleClas C++多线程部署示例
 
-本目录下提供`infer.cc`快速完成PaddleClas系列模型在CPU/GPU，以及GPU上通过TensorRT加速部署的示例。
+本目录下提供`multi_thread.cc`快速完成PaddleClas系列模型在CPU/GPU，以及GPU上通过TensorRT加速多线程部署的示例。
 
 在部署前，需确认以下两个步骤
 
-- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
-- 2. 根据开发环境，下载预编译部署库和samples代码，参考[FastDeploy预编译库](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
+- 2. 根据开发环境，下载预编译部署库和samples代码，参考[FastDeploy预编译库](../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
 
 以Linux上ResNet50_vd推理为例，在本目录执行如下命令即可完成编译测试，支持此模型需保证FastDeploy版本0.7.0以上(x.x.x>=0.7.0)
 
@@ -24,56 +24,14 @@ tar -xvf ResNet50_vd_infer.tgz
 wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
 
 
-# CPU推理
-./infer_demo ResNet50_vd_infer ILSVRC2012_val_00000010.jpeg 0
-# GPU推理
-./infer_demo ResNet50_vd_infer ILSVRC2012_val_00000010.jpeg 1
-# GPU上TensorRT推理
-./infer_demo ResNet50_vd_infer ILSVRC2012_val_00000010.jpeg 2
+# CPU多线程推理
+./infer_demo ResNet50_vd_infer ILSVRC2012_val_00000010.jpeg 0 1
+# GPU多线程推理
+./infer_demo ResNet50_vd_infer ILSVRC2012_val_00000010.jpeg 1 1
+# GPU上TensorRT多线程推理
+./infer_demo ResNet50_vd_infer ILSVRC2012_val_00000010.jpeg 2 1
 ```
+>> **注意**: 最后一位数字表示线程数
 
 以上命令只适用于Linux或MacOS, Windows下SDK的使用方式请参考:  
-- [如何在Windows中使用FastDeploy C++ SDK](../../../../../docs/cn/faq/use_sdk_on_windows.md)
-
-## PaddleClas C++接口
-
-### PaddleClas类
-
-```c++
-fastdeploy::vision::classification::PaddleClasModel(
-        const string& model_file,
-        const string& params_file,
-        const string& config_file,
-        const RuntimeOption& runtime_option = RuntimeOption(),
-        const ModelFormat& model_format = ModelFormat::PADDLE)
-```
-
-PaddleClas模型加载和初始化，其中model_file, params_file为训练模型导出的Paddle inference文件，具体请参考其文档说明[模型导出](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/inference_deployment/export_model.md#2-%E5%88%86%E7%B1%BB%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA)
-
-**参数**
-
-> * **model_file**(str): 模型文件路径
-> * **params_file**(str): 参数文件路径
-> * **config_file**(str): 推理部署配置文件
-> * **runtime_option**(RuntimeOption): 后端推理配置，默认为None，即采用默认配置
-> * **model_format**(ModelFormat): 模型格式，默认为Paddle格式
-
-#### Predict函数
-
-> ```c++
-> PaddleClasModel::Predict(cv::Mat* im, ClassifyResult* result, int topk = 1)
-> ```
->
-> 模型预测接口，输入图像直接输出检测结果。
->
-> **参数**
->
-> > * **im**: 输入图像，注意需为HWC，BGR格式
-> > * **result**: 分类结果，包括label_id，以及相应的置信度, ClassifyResult说明参考[视觉模型预测结果](../../../../../docs/api/vision_results/)
-> > * **topk**(int):返回预测概率最高的topk个分类结果，默认为1
-
-
-- [模型介绍](../../)
-- [Python部署](../python)
-- [视觉模型预测结果](../../../../../docs/api/vision_results/)
-- [如何切换模型推理后端引擎](../../../../../docs/cn/faq/how_to_change_backend.md)
+- [如何在Windows中使用FastDeploy C++ SDK](../../../docs/cn/faq/use_sdk_on_windows.md)
\ No newline at end of file
diff --git a/tutorials/multi_thread/python/README.md b/tutorials/multi_thread/python/README.md
index 9d17e6f658..508d5c7e03 100644
--- a/tutorials/multi_thread/python/README.md
+++ b/tutorials/multi_thread/python/README.md
@@ -1,31 +1,45 @@
-# PaddleClas模型 Python部署示例
+# PaddleClas模型 Python多线程/进程部署示例
 
 在部署前，需确认以下两个步骤
 
-- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
-- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+
+本目录下提供`multi_thread_process.py`快速完成ResNet50_vd在CPU/GPU，以及GPU上通过TensorRT加速部署的多线程/进程示例。执行如下脚本即可完成
 
-本目录下提供`infer.py`快速完成ResNet50_vd在CPU/GPU，以及GPU上通过TensorRT加速部署的示例。执行如下脚本即可完成
 
 ```bash
 #下载部署示例代码
 git clone https://github.com/PaddlePaddle/FastDeploy.git
-cd  FastDeploy/examples/vision/classification/paddleclas/python
+cd  FastDeploy/tutorials/multi_thread/python
 
 # 下载ResNet50_vd模型文件和测试图片
 wget https://bj.bcebos.com/paddlehub/fastdeploy/ResNet50_vd_infer.tgz
 tar -xvf ResNet50_vd_infer.tgz
 wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.4/deploy/images/ImageNet/ILSVRC2012_val_00000010.jpeg
 
-# CPU推理
-python infer.py --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --device cpu --topk 1
-# GPU推理
-python infer.py --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --topk 1
-# GPU上使用TensorRT推理 （注意：TensorRT推理第一次运行，有序列化模型的操作，有一定耗时，需要耐心等待）
-python infer.py --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --device gpu --use_trt True --topk 1
-# IPU推理（注意：IPU推理首次运行会有序列化模型的操作，有一定耗时，需要耐心等待）
-python infer.py --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --device ipu --topk 1
+
+# CPU多线程推理
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device cpu --topk 1 --thread_num 1
+# CPU多进程推理
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device cpu --topk 1 --use_multi_process True --process_num 1
+
+# GPU多线程推理
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device gpu --topk 1 --thread_num 1
+# GPU多进程推理
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device gpu --topk 1 --use_multi_process True --process_num 1
+
+# GPU上使用TensorRT多线程推理 （注意：TensorRT推理第一次运行，有序列化模型的操作，有一定耗时，需要耐心等待）
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device gpu --use_trt True --topk 1 --thread_num 1
+# GPU上使用TensorRT多进程推理 （注意：TensorRT推理第一次运行，有序列化模型的操作，有一定耗时，需要耐心等待）
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device gpu --use_trt True --topk 1 --use_multi_process True --process_num 1
+
+# IPU多线程推理（注意：IPU推理首次运行会有序列化模型的操作，有一定耗时，需要耐心等待）
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device ipu --topk 1 --thread_num 1
+# IPU多进程推理（注意：IPU推理首次运行会有序列化模型的操作，有一定耗时，需要耐心等待）
+python infer.py --model ResNet50_vd_infer --image_path ILSVRC2012_val_00000010.jpeg --device ipu --topk 1 --use_multi_process True --process_num 1
 ```
+>> **注意**: `--image_path` 可以输入图片文件夹的路径
 
 运行完成后返回结果如下所示
 ```bash
@@ -33,45 +47,4 @@ ClassifyResult(
 label_ids: 153,
 scores: 0.686229,
 )
-```
-
-## PaddleClasModel Python接口
-
-```python
-fd.vision.classification.PaddleClasModel(model_file, params_file, config_file, runtime_option=None, model_format=ModelFormat.PADDLE)
-```
-
-PaddleClas模型加载和初始化，其中model_file, params_file为训练模型导出的Paddle inference文件，具体请参考其文档说明[模型导出](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/inference_deployment/export_model.md#2-%E5%88%86%E7%B1%BB%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA)
-
-**参数**
-
-> * **model_file**(str): 模型文件路径
-> * **params_file**(str): 参数文件路径
-> * **config_file**(str): 推理部署配置文件
-> * **runtime_option**(RuntimeOption): 后端推理配置，默认为None，即采用默认配置
-> * **model_format**(ModelFormat): 模型格式，默认为Paddle格式
-
-### predict函数
-
-> ```python
-> PaddleClasModel.predict(input_image, topk=1)
-> ```
->
-> 模型预测结口，输入图像直接输出分类topk结果。
->
-> **参数**
->
-> > * **input_image**(np.ndarray): 输入数据，注意需为HWC，BGR格式
-> > * **topk**(int):返回预测概率最高的topk个分类结果，默认为1
-
-> **返回**
->
-> > 返回`fastdeploy.vision.ClassifyResult`结构体，结构体说明参考文档[视觉模型预测结果](../../../../../docs/api/vision_results/)
-
-
-## 其它文档
-
-- [PaddleClas 模型介绍](..)
-- [PaddleClas C++部署](../cpp)
-- [模型预测结果说明](../../../../../docs/api/vision_results/)
-- [如何切换模型推理后端引擎](../../../../../docs/cn/faq/how_to_change_backend.md)
+```
\ No newline at end of file