forked from PaddlePaddle/FastDeploy
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Doc] Update YOLOv5 doc for TIMVX NPU (PaddlePaddle#1041)
* update yolov5 doc for TIMVX * update doc * update doc * update doc
- Loading branch information
1 parent
58d63f3
commit 96ca92c
Showing
12 changed files
with
156 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
[English](../../en/faq/heterogeneous_computing_on_timvx_npu.md) | 中文 | ||
|
||
# 在芯原系列 NPU 上实现异构计算 | ||
在芯原系列 NPU 上,例如 RV1126 或者 A311D 上部署全量化模型时,有可能会有精度下降的问题,那么就需要在 NPU 和 ARM CPU 上进行异构计算,FastDeploy 中的异构计算是通过 subgraph.txt 配置文件来完成的,如果在更换全量化模型后,发现精度有较大的下降,可以参考本文档来定义异构配置文件。 | ||
|
||
异构配置文件的更新步骤: | ||
1. 确定模型量化后在 ARM CPU 上的精度。 | ||
- 如果在 ARM CPU 上,精度都无法满足,那量化本身就是失败的,此时可以考虑修改训练集或者更改量化方法。 | ||
- 只需要修改 demo 中的代码,将 NPU 推理的部分改为使用 ARM CPU int8 推理,便可实现使用ARM CPU进行计算 | ||
``` | ||
# 如下接口表示使用 NPU 进行推理 | ||
fastdeploy::RuntimeOption option; | ||
option.UseTimVX(); # 开启 TIMVX 进行 NPU 推理 | ||
option.SetLiteSubgraphPartitionPath(subgraph_file); # 加载异构计算配置文件 | ||
# 如下接口表示使用 ARM CPU int8 推理 | ||
fastdeploy::RuntimeOption option; | ||
option.UseLiteBackend(); | ||
option.EnableLiteInt8(); | ||
``` | ||
如果 ARM CPU 计算结果精度达标,则继续下面的步骤。 | ||
2. 获取整网拓扑信息。 | ||
- 回退第一步中的修改,使用 NPU 进行推理的 API 接口,加载异构计算配置文件的开关保持关闭。 | ||
- 将所有的日志信息写入到 log.txt中,在 log.txt 中搜索关键字 "subgraph operators" 随后的一段便是整个模型的拓扑信息 | ||
- 它的格式如下: | ||
- 每行记录由 ”算子类型:输入张量名列表:输出张量名列表“ 组成(即以分号分隔算子类型、输入和输出张量名列表),以逗号分隔输入、输出张量名列表中的每个张量名; | ||
- 示例说明: | ||
``` | ||
op_type0:var_name0,var_name1:var_name2 # 表示将算子类型为 op_type0、输入张量为var_name0 和 var_name1、输出张量为 var_name2 的节点强制运行在 ARM CPU 上 | ||
``` | ||
3. 修改异构配置文件 | ||
- 将所有的 Subgraph operators 写到在 subgraph.txt 中,并打开加载异构计算配置文件的接口 | ||
- 逐行删除、成片删除、二分法,发挥开发人员的耐心,找到引入 NPU 精度异常的 layer,将其留在 subgraph.txt 中 | ||
- 在 txt 中的结点都是需要异构到 ARM CPU 上的 layer,不用特别担心性能问题,Paddle Lite 的 ARM kernel 性能也是非常卓越的 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
English | [中文](../../cn/faq/heterogeneous_computing_on_timvx_npu.md) | ||
|
||
# Heterogeneous Computing on VeriSilicon Series NPUs | ||
When deploying a quantized model on a VeriSilicon series NPU, such as RV1126 or A311D, there may be a problem of decreased accuracy, so heterogeneous computing needs to be performed on the NPU and ARM CPU. The heterogeneous computing in FastDeploy is implemented by loading subgraph.txt configuration files. If you find that the accuracy has dropped significantly after replacing the quantized model, you can refer to this document to define the heterogeneous configuration file. | ||
|
||
Update steps for heterogeneous configuration files: | ||
1. Determine the accuracy of the quantized model on an ARM CPU. | ||
- If the accuracy cannot be satisfied on the ARM CPU, then there is a problem with the quantized model. At this time, you can consider modifying the dataset or changing the quantization method. | ||
- Only need to modify a few lines of code in the demo, change the part of NPU inference to use ARM CPU int8. | ||
``` | ||
# The following interface represents the use of NPU for inference | ||
fastdeploy::RuntimeOption option; | ||
option.UseTimVX(); # Turn on TIMVX for NPU inference | ||
option.SetLiteSubgraphPartitionPath(subgraph_file); # Load heterogeneous computing configuration files | ||
# The following interface indicates the use of ARM CPU int8 inference | ||
fastdeploy::RuntimeOption option; | ||
option.UseLiteBackend(); | ||
option.EnableLiteInt8(); | ||
``` | ||
If the ARM CPU accuracy is up to standard, continue with the next steps. | ||
2. Obtain the topology information of the entire network. | ||
- Roll back the modification in the first step, use the API interface of NPU for inference, and keep the switch of loading heterogeneous computing configuration files off. | ||
- Write all the log information to log.txt, search for the keyword "subgraph operators" in log.txt and the following paragraph is the topology information of the entire model. | ||
- It has the following format: | ||
- Each line of records consists of "operator type: list of input tensor names: list of output tensor names" (that is, the operator type, list of input and output tensor names are separated by semicolons), and the input and output tensor names are separated by commas each tensor name in the list; | ||
- Example: | ||
``` | ||
op_type0:var_name0,var_name1:var_name2 # Indicates that the node whose operator type is op_type0, input tensors are var_name0 and var_name1, and output tensor is var_name2 is forced to run on the ARM CPU | ||
``` | ||
3. Modify heterogeneous configuration files | ||
- Write all Subgraph operators in subgraph.txt, and open the interface for loading heterogeneous computing configuration files | ||
- Delete line by line, delete in pieces, dichotomy, use the patience of developers, find the layer that introduces NPU precision exception, and leave it in subgraph.txt | ||
- The nodes in txt all need to be heterogeneous to the layer on the ARM CPU, so don’t worry about performance issues. The ARM kernel performance of Paddle Lite is also very good. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.