1. NPU Brief Introduction to Development

1.1. Preface

BOARD OS NPU connection method RKNPU supported version RKNN Toolkit (Lite) Supported Versions
AIO-3399Pro-JD4 Linux USB3.0 1.7.1 1.7.1
AIO-3399ProC Linux PCIE 1.7.1 1.7.1

AIO-3399ProC developers need to pay attention:

The NPU inference stage will communicate with the CPU. The amount of data transmitted in a single transmission is small but the frequency is high. However, compared with USB3.0, PCIE is not suitable for small file transmission. Therefore, using the “PCIE-connected NPU” will be much slower in actual inference speed than using the “USB3.0-connected NPU”. In order to solve this problem, we have made related PCIE-NPU acceleration patches, and the final result is that the inference speed is equal to the “NPU connected by USB3.0”. Please refer to the patch notes for specific usage and precautions.

Note: The PCIE-NPU acceleration patch is not enabled by default. Please refer to the patch notes for specific enabling methods and precautions.

  • PCIE-NPU acceleration patch https://gitlab.com/firefly-linux/kernel/-/commit/892d398b3830c1e1b0560167f1896f38bd7e084a

Confirm whether the PCIE-NUP acceleration patch is enabled Developers can test whether the PCIE-NUP acceleration patch has been activated through the following four test items:

  • The system can start normally

  • NPU driver version not less than 1.7.1

dpkg -l |grep firefly-3399pronpu-driver
  • Check that the status of PCIE_RK1808_OPTIMIZATION is okay

cat /sys/firmware/devicetree/base/PCIE_RK1808_OPTIMIZATION/status
  • No error reported when using NPU

1.2. NPU Characteristics

  • Supports 8 bit/16 bit operation with 3.0 TOPS performance

  • Compared with the large chip scheme of GPU as AI computing unit, the power consumption of GPU is less than 1%.

  • Could load Caffe / Mxnet / TensorFlow model Directly

  • Provide AI development tools: support rapid model transformation, development board end-to-side conversion API, TensorFlow / TF Lite / Caffe / ONNX / Darknet and other models

  • Provide AI application development interface: support Android NN API, RKNN cross-platform API, Linux support TensorFlow development

1.3. Process

The complete NPU development process is shown in the following figure

_images/rknn_development_flow_en.png

1.3.1. 1. model training

In the model training stage, users choose the appropriate framework (such as Caffe, TensorFlow, etc.) according to the needs and actual conditions to train to get the model that meets the needs. The trained model can also be used directly.

1.3.2. 2. Model transformation

In this stage, the model obtained from model training is transformed into the model available to NPU through RKNN Toolkit.

1.3.3. 3. Program Development

The last stage is to implement business logic for Python API development program based on RKNN API or RKNN Tookit

This document mainly introduces model transformation and RKNN-based program development, and does not involve the content of model training.