Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/rknn_pybind' into rknn_pybind
Browse files Browse the repository at this point in the history
  • Loading branch information
Zheng-Bicheng committed Mar 7, 2023
2 parents 7bed7a5 + c16f528 commit b5fb9fa
Show file tree
Hide file tree
Showing 7 changed files with 252 additions and 62 deletions.
146 changes: 144 additions & 2 deletions docs/cn/build_and_install/rknpu2.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,153 @@
[English](../../en/build_and_install/rknpu2.md) | 简体中文
# FastDeploy RKNPU2资源导航

## 写在前面
# FastDeploy RKNPU2 导航文档

RKNPU2指的是Rockchip推出的RK356X以及RK3588系列芯片的NPU。
目前FastDeploy已经初步支持使用RKNPU2来部署模型。
如果您在使用的过程中出现问题,请附带上您的运行环境,在Issues中反馈。

## FastDeploy RKNPU2 环境安装简介

如果您想在FastDeploy中使用RKNPU2推理引擎,你需要配置以下几个环境。

| 工具名 | 是否必须 | 安装设备 | 用途 |
|--------------|------|-------|---------------------------------|
| Paddle2ONNX | 必装 | PC | 用于转换PaddleInference模型到ONNX模型 |
| RKNNToolkit2 | 必装 | PC | 用于转换ONNX模型到rknn模型 |
| RKNPU2 | 选装 | Board | RKNPU2的基础驱动,FastDeploy已经集成,可以不装 |

## 安装模型转换环境

模型转换环境需要在Ubuntu下完成,我们建议您使用conda作为python控制器,并使用python3.6作为您的模型转换环境。
例如您可以输入以下命令行完成对python3.6环境的创建

```bash
conda create -n rknn2 python=3.6
conda activate rknn2
```

### 安装必备的依赖软件包

在安装RKNNtoolkit2之前我们需要安装一下必备的软件包

```bash
sudo apt-get install libxslt1-dev zlib1g zlib1g-dev libglib2.0-0 libsm6 libgl1-mesa-glx libprotobuf-dev gcc g++
```


### 安装RKNNtoolkit2

目前,FastDeploy使用的转化工具版本号为1.4.2b3。如果你有使用最新版本的转换工具的需求,你可以在Rockchip提供的[百度网盘(提取码为rknn)](https://eyun.baidu.com/s/3eTDMk6Y)
中找到最新版本的模型转换工具。

```bash
# rknn_toolkit2对numpy存在特定依赖,因此需要先安装numpy==1.16.6
pip install numpy==1.16.6

# 安装rknn_toolkit2-1.3.0_11912b58-cp38-cp38-linux_x86_64.whl
wget https://bj.bcebos.com/fastdeploy/third_libs/rknn_toolkit2-1.4.2b3+0bdd72ff-cp36-cp36m-linux_x86_64.whl
pip install rknn_toolkit2-1.4.2b3+0bdd72ff-cp36-cp36m-linux_x86_64.whl
```

## 安装FastDeploy C++ SDK

针对RK356X和RK3588的性能差异,我们提供了两种编译FastDeploy的方式。


### 板端编译FastDeploy C++ SDK

针对RK3588,其CPU性能较强,板端编译的速度还是可以接受的,我们推荐在板端上进行编译。以下教程在RK356X(debian10),RK3588(debian 11) 环境下完成。

```bash
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy

# 如果您使用的是develop分支输入以下命令
git checkout develop

mkdir build && cd build
cmake .. -DENABLE_ORT_BACKEND=ON \
-DENABLE_RKNPU2_BACKEND=ON \
-DENABLE_VISION=ON \
-DRKNN2_TARGET_SOC=RK3588 \
-DCMAKE_INSTALL_PREFIX=${PWD}/fastdeploy-0.0.0
make -j8
make install
```

### 交叉编译FastDeploy C++ SDK

针对RK356X,其CPU性能较弱,我们推荐使用交叉编译进行编译。以下教程在Ubuntu 22.04环境下完成。

```bash
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy

# 如果您使用的是develop分支输入以下命令
git checkout develop

mkdir build && cd build
cmake .. -DCMAKE_C_COMPILER=/home/zbc/opt/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=/home/zbc/opt/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++ \
-DCMAKE_TOOLCHAIN_FILE=./../cmake/toolchain.cmake \
-DTARGET_ABI=arm64 \
-DENABLE_ORT_BACKEND=OFF \
-DENABLE_RKNPU2_BACKEND=ON \
-DENABLE_VISION=ON \
-DRKNN2_TARGET_SOC=RK356X \
-DCMAKE_INSTALL_PREFIX=${PWD}/fastdeploy-0.0.0
make -j8
make install
```

如果你找不到编译工具,你可以复制[交叉编译工具](https://bj.bcebos.com/paddle2onnx/libs/gcc-linaro-6.3.1-2017.zip)进行下载。

### 配置环境变量

为了方便大家配置环境变量,FastDeploy提供了一键配置环境变量的脚本,在运行程序前,你需要执行以下命令

```bash
# 临时配置
source PathToFastDeploySDK/fastdeploy_init.sh

# 永久配置
source PathToFastDeploySDK/fastdeploy_init.sh
sudo cp PathToFastDeploySDK/fastdeploy_libs.conf /etc/ld.so.conf.d/
sudo ldconfig
```

## 编译FastDeploy Python SDK

除了NPU,Rockchip的芯片还有其他的一些功能。
这些功能大部分都是需要C/C++进行编程,因此如果您使用到了这些模块,我们不推荐您使用Python SDK.
Python SDK的编译暂时仅支持板端编译, 以下教程在RK3568(debian 10)、RK3588(debian 11) 环境下完成。Python打包依赖`wheel`,编译前请先执行`pip install wheel`


```bash
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy

# 如果您使用的是develop分支输入以下命令
git checkout develop

cd python
export ENABLE_ORT_BACKEND=ON
export ENABLE_RKNPU2_BACKEND=ON
export ENABLE_VISION=ON

# 请根据你的开发版的不同,选择RK3588和RK356X
export RKNN2_TARGET_SOC=RK3588

# 如果你的核心板的运行内存大于等于8G,我们建议您执行以下命令进行编译。
python3 setup.py build
# 值得注意的是,如果你的核心板的运行内存小于8G,我们建议您执行以下命令进行编译。
python3 setup.py build -j1

python3 setup.py bdist_wheel
cd dist
pip3 install fastdeploy_python-0.0.0-cp39-cp39-linux_aarch64.whl
```

## 导航目录

* [RKNPU2开发环境搭建](../faq/rknpu2/environment.md)
Expand Down
6 changes: 3 additions & 3 deletions examples/vision/ocr/PP-OCRv3/rknpu2/cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ This directory provides examples that `infer.cc` fast finishes the deployment of

Two steps before deployment

- 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
- 2. Download the precompiled deployment library and samples code according to your development environment. Refer to [FastDeploy Precompiled Library](../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
- 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
- 2. Download the precompiled deployment library and samples code according to your development environment. Refer to [FastDeploy Precompiled Library](../../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)

Taking the CPU inference on Linux as an example, the compilation test can be completed by executing the following command in this directory. FastDeploy version 0.7.0 or above (x.x.x>=0.7.0) is required to support this model.

Expand Down Expand Up @@ -49,7 +49,7 @@ The visualized result after running is as follows
## Other Documents

- [C++ API Reference](https://baidu-paddle.github.io/fastdeploy-api/cpp/html/)
- [PPOCR Model Description](../../)
- [PPOCR Model Description](../README.md)
- [PPOCRv3 Python Deployment](../python)
- [Model Prediction Results](../../../../../../docs/en/faq/how_to_change_backend.md)
- [How to switch the model inference backend engine](../../../../../../docs/en/faq/how_to_change_backend.md)
8 changes: 4 additions & 4 deletions examples/vision/ocr/PP-OCRv3/rknpu2/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ English | [简体中文](README_CN.md)

Two steps before deployment

- 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
- 2. Install FastDeploy Python whl package. Refer to [FastDeploy Python Installation](../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
- 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)
- 2. Install FastDeploy Python whl package. Refer to [FastDeploy Python Installation](../../../../../../docs/en/build_and_install/download_prebuilt_libraries.md)

This directory provides examples that `infer.py` fast finishes the deployment of PPOCRv3 on CPU/GPU and GPU accelerated by TensorRT. The script is as follows

Expand Down Expand Up @@ -43,7 +43,7 @@ The visualized result after running is as follows
## Other Documents

- [Python API reference](https://baidu-paddle.github.io/fastdeploy-api/python/html/)
- [PPOCR Model Description](../../)
- [PPOCR Model Description](../README.md)
- [PPOCRv3 C++ Deployment](../cpp)
- [Model Prediction Results](../../../../../../docs/api/vision_results/)
- [Model Prediction Results](../../../../../../docs/api/vision_results/README.md)
- [How to switch the model inference backend engine](../../../../../../docs/en/faq/how_to_change_backend.md)
8 changes: 4 additions & 4 deletions examples/vision/ocr/PP-OCRv3/rknpu2/python/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

在部署前,需确认以下两个步骤

- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
- 2. FastDeploy Python whl包安装,参考[FastDeploy Python安装](../../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)

本目录下提供`infer.py`快速完成PPOCRv3在CPU/GPU,以及GPU上通过TensorRT加速部署的示例。执行如下脚本即可完成

Expand Down Expand Up @@ -56,7 +56,7 @@ python3 infer_static_shape.py \
## 其它文档

- [Python API文档查阅](https://baidu-paddle.github.io/fastdeploy-api/python/html/)
- [PPOCR 系列模型介绍](../../)
- [PPOCR 系列模型介绍](../README.md)
- [PPOCRv3 C++部署](../cpp)
- [模型预测结果说明](../../../../../../docs/api/vision_results/)
- [模型预测结果说明](../../../../../../docs/api/vision_results/README_CN.md)
- [如何切换模型推理后端引擎](../../../../../../docs/cn/faq/how_to_change_backend.md)
4 changes: 2 additions & 2 deletions fastdeploy/benchmark/option.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ namespace fastdeploy {
*/
namespace benchmark {

/*! @brief Option object used to control the behavior of the benchmark profiling.
*/
// @brief Option object used to control the behavior of the benchmark profiling.
//
struct BenchmarkOption {
int warmup = 50; ///< Warmup for backend inference.
int repeats = 100; ///< Repeats for backend inference.
Expand Down
114 changes: 81 additions & 33 deletions fastdeploy/core/fd_tensor.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,89 @@

namespace fastdeploy {

/*! @brief FDTensor object used to represend data matrix
*
*/
struct FASTDEPLOY_DECL FDTensor {
// std::vector<int8_t> data;
/** \brief Set data buffer for a FDTensor, e.g
* ```
* std::vector<float> buffer(1 * 3 * 224 * 224, 0);
* FDTensor tensor;
* tensor.SetData({1, 3, 224, 224}, FDDataType::FLOAT, buffer.data());
* ```
* \param[in] tensor_shape The shape of tensor
* \param[in] data_type The data type of tensor
* \param[in] data_buffer The pointer of data buffer memory
* \param[in] copy Whether to copy memory from data_buffer to tensor, if false, this tensor will share memory with data_buffer, and the data is managed by userself
* \param[in] data_device The device of data_buffer, e.g if data_buffer is a pointer to GPU data, the device should be Device::GPU
* \param[in] data_device_id The device id of data_buffer
*/
void SetData(const std::vector<int64_t>& tensor_shape, const FDDataType& data_type, void* data_buffer, bool copy = false, const Device& data_device = Device::CPU, int data_device_id = -1) {
SetExternalData(tensor_shape, data_type, data_buffer, data_device, data_device_id);
if (copy) {
StopSharing();
}
}

/// Get data pointer of tensor
void* GetData() {
return MutableData();
}
/// Get data pointer of tensor
const void* GetData() const {
return Data();
}

/// Expand the shape of tensor, it will not change the data memory, just modify its attribute `shape`
void ExpandDim(int64_t axis = 0);

/// Squeeze the shape of tensor, it will not change the data memory, just modify its attribute `shape`
void Squeeze(int64_t axis = 0);

/// Reshape the tensor, it will not change the data memory, just modify its attribute `shape`
bool Reshape(const std::vector<int64_t>& new_shape);

/// Total size of tensor memory buffer in bytes
int Nbytes() const;

/// Total number of elements in tensor
int Numel() const;

/// Get shape of tensor
std::vector<int64_t> Shape() const { return shape; }

/// Get dtype of tensor
FDDataType Dtype() const { return dtype; }

/** \brief Allocate cpu data buffer for a FDTensor, e.g
* ```
* FDTensor tensor;
* tensor.Allocate(FDDataType::FLOAT, {1, 3, 224, 224};
* ```
* \param[in] data_type The data type of tensor
* \param[in] tensor_shape The shape of tensor
*/
void Allocate(const FDDataType& data_type, const std::vector<int64_t>& data_shape) {
Allocate(data_shape, data_type, name);
}

/// Debug function, print shape, dtype, mean, max, min of tensor
void PrintInfo(const std::string& prefix = "Debug TensorInfo: ") const;

/// Name of tensor, while feed to runtime, this need be defined
std::string name = "";

/// Whether the tensor is owned the data buffer or share the data buffer from outside
bool IsShared() { return external_data_ptr != nullptr; }
/// If the tensor is share the data buffer from outside, `StopSharing` will copy to its own structure; Otherwise, do nothing
void StopSharing();


// ******************************************************
// The following member and function only used by inside FastDeploy, maybe removed in next version

void* buffer_ = nullptr;
std::vector<int64_t> shape = {0};
std::string name = "";
FDDataType dtype = FDDataType::INT8;

// This use to skip memory copy step
Expand Down Expand Up @@ -64,10 +142,6 @@ struct FASTDEPLOY_DECL FDTensor {

void* Data();

bool IsShared() { return external_data_ptr != nullptr; }

void StopSharing();

const void* Data() const;

// Use this data to get the tensor data to process
Expand All @@ -78,22 +152,14 @@ struct FASTDEPLOY_DECL FDTensor {
// will copy to cpu store in `temporary_cpu_buffer`
const void* CpuData() const;

// void SetDataBuffer(const std::vector<int64_t>& new_shape, const FDDataType& data_type, void* data_buffer, bool copy = false, const Device& new_device = Device::CPU, int new_device_id = -1);
// Set user memory buffer for Tensor, the memory is managed by
// the user it self, but the Tensor will share the memory with user
// So take care with the user buffer
void SetExternalData(const std::vector<int64_t>& new_shape,
const FDDataType& data_type, void* data_buffer,
const Device& new_device = Device::CPU,
int new_device_id = -1);

// Expand the shape of a Tensor. Insert a new axis that will appear
// at the `axis` position in the expanded Tensor shape.
void ExpandDim(int64_t axis = 0);

// Squeeze the shape of a Tensor. Erase the axis that will appear
// at the `axis` position in the squeezed Tensor shape.
void Squeeze(int64_t axis = 0);

// Initialize Tensor
// Include setting attribute for tensor
// and allocate cpu memory buffer
Expand All @@ -102,18 +168,6 @@ struct FASTDEPLOY_DECL FDTensor {
const std::string& tensor_name = "",
const Device& new_device = Device::CPU);

// Total size of tensor memory buffer in bytes
int Nbytes() const;

// Total number of elements in this tensor
int Numel() const;

// Get shape of FDTensor
std::vector<int64_t> Shape() const { return shape; }

// Get dtype of FDTensor
FDDataType Dtype() const { return dtype; }

void Resize(size_t nbytes);

void Resize(const std::vector<int64_t>& new_shape);
Expand All @@ -122,12 +176,6 @@ struct FASTDEPLOY_DECL FDTensor {
const FDDataType& data_type, const std::string& tensor_name = "",
const Device& new_device = Device::CPU);

bool Reshape(const std::vector<int64_t>& new_shape);
// Debug function
// Use this function to print shape, dtype, mean, max, min
// prefix will also be printed as tag
void PrintInfo(const std::string& prefix = "TensorInfo: ") const;

bool ReallocFn(size_t nbytes);

void FreeFn();
Expand Down
Loading

0 comments on commit b5fb9fa

Please sign in to comment.