Self-developed Algorithm¶

CAM-CRV1126S2U/CAM-CRV1109S2U NPU must use RKNN model for model inference. If users use the algorithm model developed by themselves, they need to be familiar with the RKNN development process and convert their own model to RKNN before using it. This chapter explains the use of RKNN development tools for such users.

RKNN Introduction¶

RKNN is the model type used by Rockchip NPU platform, and the model file ends with .rknn suffix. Rockchip provides a complete model conversion Python tool, which is convenient for users to convert self-developed algorithm models into RKNN models. At the same time, Rockchip also provides C/C++ and Python API interfaces.

RKNN-Toolkit¶

Tool Introduction¶

RKNN-Toolkit is a development kit that provides users with model conversion, reasoning and performance evaluation on PC and Rockchip NPU platforms. Users can easily complete the following functions through the Python interface provided by this tool:

Model Conversion: Supports Caffe、TensorFlow、TensorFlow Lite、ONNX、Darknet、Pytorch、MXNet model to be converted into RKNN model, support RKNN model import and export, and later can be loaded and used on Rockchip NPU platform .Since 1.2.0 version, multi-input model is supported. Pytorch and MXNet are supported starting from 1.3.0.
Quantization Function: Supports the conversion of floating-point models to quantized models, currently supported quantization methods are asymmetric_quantized-u8 (asymmetric_quantized-u8), dynamic fixed-point quantization ( dynamic_fixed_point-8 and dynamic_fixed_point-16). Starting from the 1.0.0 version, RKNN-Toolkit began to support the hybrid quantization function.
Model Reasoning: It can simulate Rockchip NPU to run the RKNN model and get the inference result on the PC; it can also distribute the RKNN model to the designated NPU device for inference.
Performance Evaluation: Able to simulate Rockchip NPU to run RKNN model on PC, and evaluate model performance (including total time and the time of each layer); you can also use RKNN The model is distributed to run on the designated NPU device to evaluate the performance of the model when it runs on the actual device.
Memory Evaluation: Evaluate the system and NPU memory consumption when the model is running. When using this function, the RKNN model must be distributed to the NPU device to run, and the relevant interface must be called to obtain memory usage information. This function is supported since version 0.9.9.
Model Precompilation: The RKNN model generated by precompilation technology can reduce the loading time on the hardware platform. For some models, the size of the model can also be reduced. But the pre-compiled RKNN model can only be run on NPU devices. Currently, only the x86_64 Ubuntu platform supports directly generating a pre-compiled RKNN model from the original model. RKNN-Toolkit supports model pre-compilation from 0.9.5 version, and has upgraded the pre-compilation method in 1.0.0. Starting from the 1.4.0 version, the normal RKNN model can also be converted into a pre-compiled RKNN model through the NPU device.
Model Segmentation: This function is used in scenarios where multiple models are running at the same time. A single model can be divided into multiple segments and executed on the NPU, thereby adjusting multiple models to occupy the NPU The execution time of this model can be avoided because one model takes up too much execution time and other models cannot be executed in time. RKNN-Toolkit supports this function from 1.2.0 version. This function must be used on hardware with Rockchip NPU, and the NPU driver version must be greater than 0.9.8.
Custom Operator Function: If the model contains an operator that is not supported by RKNN-Toolkit, it will fail in the model conversion stage. At this time, you can use the custom operator function to add unsupported operators, so that the model can be converted and run normally. RKNN-Toolkit supports this function from 1.2.0 version. For the use and development of custom operators, please refer to the "Rockchip_Developer_Guide_RKNN_Toolkit_Custom_OP_EN" document.
Quantization Accuracy Analysis Function: This function will give the Euclidean distance or cosine distance of each layer of inference results before and after the quantization of the model to analyze how the quantization error occurs. Provide ideas for improving the accuracy of the quantitative model. This feature is supported from version 1.3.0. The 1.4.0 version adds a layer-by-layer quantization accuracy analysis sub-function. The input of each layer is specified as the correct floating point value to eliminate the accumulation of layer-by-layer error, which can more accurately reflect the quantization of each layer itself influences.
Visualization Function: This function presents various functions of RKNN-Toolkit in the form of a graphical interface, simplifying user operation steps. Users can complete functions such as model conversion and reasoning by filling in forms and clicking function buttons, instead of manually writing scripts. For the specific usage of the visualization function, please refer to the "Rockchip_User_Guide_RKNN_Toolkit_Visualization_EN" document. Version 1.3.0 began to support this function. The 1.4.0 version improves the support for multi-input models, and supports new Rockchip NPU devices such as RK1806, RV1109, RV1126.
Model Optimization Level Function: RKNN-Toolkit will optimize the model during model conversion, and the default optimization options may have some impact on the accuracy of the model. By setting the optimization level, you can turn off some or all optimization options. For the specific usage of the optimization level, please refer to the description of the optimization_level parameter in the config interface. This feature is supported from version 1.3.0.

Environment Dependence¶

System Support: Ubuntu 16.04 x64 (above), Window 7 x64 (above), Mac OS X 10.13.5 x64 (above), Debian 9.8 (x64) or higher
Python Version: 3.5/3.6/3.7
Python Dependence：

'numpy == 1.16.3'
'scipy == 1.3.0'
'Pillow == 5.3.0'
'h5py == 2.8.0'
'lmdb == 0.93'
'networkx == 1.11'
'flatbuffers == 1.10',
'protobuf == 3.6.1'
'onnx == 1.4.1'
'onnx-tf == 1.2.1'
'flask == 1.0.2'
'tensorflow == 1.11.0' or 'tensorflow-gpu'
'dill == 0.2.8.2'
'ruamel.yaml == 0.15.81'
'psutils == 5.6.2'
'ply == 3.11'
'requests == 3.11'
'pytorch == 1.2.0'
'mxnet == 1.5.0'

PS：

Windows only provides the installation package of Python3.6.
MacOS provides installation packages of Python3.6 and Python3.7.
ARM64 platform (installing Debian 9 or 10 operating system) provides installation packages of Python3.5 (Debain 9) and Python3.7 (Debian10).
Except for the MacOS platform, the scipy dependency of other platforms is >=1.1.0.

Quick Start¶

The test environment uses the Ubuntu 16.04 x86_64 PC host. Other platforms can refer to sdk/external/rknn-toolkit/doc/Rockchip_Quick_Start_RKNN_Toolkit_V1.4.0_XX.pdf。

RKNN-Toolkit installation

# Install python 3.5
sudo apt-get install python3.5
# Install pip3
sudo apt-get install python3-pip
# Obtain the RKNN-Toolkit installation package, and then perform the following steps
cd sdk/external/rknn-toolkit/
cp sdk/external/rknn-toolkit ./ -rf
cd rknn-toolkit/package/
pip3 install tensorflow==1.11.0
pip3 install mxnet==1.5.0
pip3 install torch==1.2.0 torchvision==0.4.0
pip3 install opencv-python
pip3 install gluoncv
# Install RKNN-Toolkit
sudo pip3 install rknn_toolkit-1.4.0-cp35-cp35m-linux_x86_64.whl
# Check if the installation is successful, import rknn library
rk@rk:~/rknn-toolkit-v1.4.0/package$ python3
>>> from rknn.api import RKNN
>>>

Typec connect host and device to run demo

cd examples/tflite/mobilenet_v1/
daijh@daijh:~/p/sdk/external/rknn-toolkit/examples/tflite/mobilenet_v1$ python3.6 ./test.py 
--> config model
done
--> Loading model
done
--> Building model
W The channel_mean_value filed will not be used in the future!
done
--> Export RKNN model
done
--> Init runtime environment
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.0.0 (8f9ebbc@2020-04-03T09:12:30)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.4.0 (b4a8096 build: 2020-08-12 10:15:19)
D RKNNAPI:   DRV: 1.5.2 (e67e5cb build: 2020-12-03 15:04:52)
D RKNNAPI: ==============================================
done
--> Running model
mobilenet_v1
-----TOP 5-----
[156]: 0.8603515625
[155]: 0.0833740234375
[205]: 0.0123443603515625
[284]: 0.00726318359375
[260]: 0.002262115478515625

done
--> Begin evaluate model performance
W When performing performance evaluation, inputs can be set to None to use fake inputs.
========================================================================
                               Performance                              
========================================================================
Total Time(us): 5573
FPS: 179.44
========================================================================

done
daijh@daijh:~/p/sdk/external/rknn-toolkit/examples/tflite/mobilenet_v1$

In addition to the python interface, we also provide a C/C++ interface for model inference. Users can complete the model conversion on the PC and then use the C/C++ on the board to complete the model inference. The following is the demo running.

# You need to modify the path of the cross compiler before compiling, vim build.sh modify GCC_COMPILER
# GCC_COMPILER=/home/daijh/p/sdk/prebuilts/gcc/linux-x86/arm/gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf 
# Here is the path of my local 32-bit cross-compilation tool. The user needs to modify it to the path of the cross-compilation tool in the SDK.
daijh@daijh:~$ cd sdk/external/rknpu/rknn/rknn_api/examples/rknn_mobilenet_demo
daijh@daijh:sdk/external/rknpu/rknn/rknn_api/examples/rknn_mobilenet_demo$ ./build.sh

# Put the compiled demo into the device
adb push rknn_mobilenet_demo/ /

# Run demo
cd rknn_mobilenet_demo
[root@RV1126_RV1109:/rknn_mobilenet_demo]# ./build/rknn_mobilenet_demo ./model/mobilenet_v1_rv1109_rv1126.rknn ./model/dog_224x224.jpg
model input num: 1, output num: 1
input tensors:
index=0 name= n_dims=4 dims=[1 224 224 3] n_elems=150528 size=150528 fmt=0 type=3 qnt_type=2 fl=127 zp=127 scale=0.007843
output tensors:
index=0 name= n_dims=2 dims=[0 0 1 1001] n_elems=1001 size=2002 fmt=0 type=1 qnt_type=0 fl=127 zp=127 scale=0.007843
rknn_run
155 - 0.091736
156 - 0.851074
205 - 0.013588

Development Document¶

After you install RKNN-Toolkit and have a preliminary understanding and verification of the development process through the demo, you can view the detailed RKNN development API to complete your own development.

RKNN-Toolkit Document：sdk/external/rknn-toolkit/doc
C/C++ API Document：sdk/external/rknpu/rknn/doc