Large Language Model

1.RKLLM Introduction

The RKLLM SDK helps users quickly deploy large language models onto AIBOX-3576. SDK Download

1.1 RKLLM-Toolkit Functions Introduction

The RKLLM-Toolkit is a development suite designed for users to perform quantization and conversion of large language models on their computers. Through the Python interface provided by this tool, users can conveniently achieve the following functions:

  • Model Conversion: Supports converting large language models in Hugging Face format to RKLLM models. The converted RKLLM models can be loaded and used on the Rockchip NPU platform.

  • Quantization: Supports quantizing floating-point models to fixed-point models. Currently supported quantization types include w4a16 and w8a8.

1.2 RKLLM Runtime Functions Introduction

The RKLLM Runtime is primarily responsible for loading RKLLM models converted using the RKLLM-Toolkit and performing inference on the Rockchip NPU by invoking the NPU driver on the AIBOX-3576 board. During the inference of RKLLM models, users can customize the inference parameters, define various text generation methods, and continuously receive inference results through predefined callback functions.

2.RKLLM-Toolkit Installation

The RKLLM-Toolkit is currently only available for Linux PC, with Ubuntu 20.04(x64) recommended. Since multiple versions of Python environments might be present on the system, it is advisable to use miniforge3 to manage Python environments.

# Check if miniforge3 and conda are installed, and if they are, you can skip this section.
conda -V
# Download the miniforge3 installer package.
wget -c https://mirrors.bfsu.edu.cn/github-release/conda-forge/miniforge/LatestRelease/Miniforge3-Linux-x86_64.sh
# Install miniforge3
chmod 777 Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh
# Activate the Conda base environment. The installation directory for miniforge3 will be the default location for Conda.
source ~/miniforge3/bin/activate
# Create a Conda environment named RKLLM-Toolkit with Python 3.8 (recommended version).
conda create -n RKLLM-Toolkit python=3.8
# Enter RKLLM-Toolkit Conda environment
conda activate RKLLM-Toolkit
# Install RKLLM-Toolkit,such as rkllm_toolkit-1.0.1-cp38-cp38-linux_x86_64.whl
pip3 install packages/rkllm_toolkit-1.0.1-cp38-cp38-linux_x86_64.whl

If the following command executes without errors, the installation was successful:

python
from rkllm.api import RKLLM

3.Large Language Model Deployment Example

The large language models supported by RKLLM are as follows:

Model Huggingface Link
TinyLLAMA-1.1B LINK
Qwen-1.8B LINK
Qwen2-0.5B LINK
Phi-2-2.7B LINK
Phi-3-3.8B LINK
ChatGLM3-6B LINK
Gemma-2B LINK
InternLM2-1.8B LINK
MiniCPM-2B LINK

3.1 Large Language Model Conversion And Execution Demo

Below is an example demonstrating how to convert, quantize, export a large language model, and finally deploy and run it on the board, using the rkllm-toolkit/examples/test.py and rkllm-runtime/examples/rkllm_api_demo provided by the RKLLM SDK.

3.1.1 Convert The Model On The PC

Using Qwen-1.8B as an example, click the table link to clone the complete repository content. Then, set the path to the model you want to convert in test.py.

# Note that you need to set the path to the Qwen-1.8B repository that you cloned.
modelpath = '/rkllm/rkllm_model/Qwen-1_8B-Chat'

With RKLLM-Toolkit properly installed, activate the Conda base environment and execute test.py. This will produce the converted model.

(RKLLM-Toolkit) root@ea5d57ca8e66:/rkllm/rknn-llm/rkllm-toolkit/examples/huggingface# python3 test.py 
rkllm-toolkit version: 1.0.1
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.84it/s]
Optimizing model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [02:13<00:00,  5.55s/it]
Converting model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [00:00<00:00, 2363841.85it/s]
Model has been saved to ./Qwen1.8.rkllm!

3.1.2 Deploy And Run The RKLLM Model On AIBOX-3576.

3.1.2.1 Kernel Requirements

Before performing model inference with RKLLM Runtime, you must first verify that the NPU kernel on the board is version v0.9.6 or higher.

root@firefly:/# cat /sys/kernel/debug/rknpu/version
RKNPU driver: v0.9.6
3.1.2.2 Compilation Requirements for RKLLM Runtime

When using RKLLM Runtime, pay attention to the version of the GCC cross-compilation toolchain. It is recommended to use cross-compilation toolchain gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu

3.1.2.3 Running Inference On AIBOX-3576

Refer to the example rkllm-runtime/examples/rkllm_api_demo for compilation. Transfer the compiled rkllm_demo executable and the library file rkllm-runtime/runtime/Linux/librkllm_api/aarch64/librkllmrt.so to the board using scp.

# Increase the maximum number of file descriptors that can be opened at the same time
root@firefly:/# ulimit -n 102400 
root@firefly:/# ./llm_demo Qwen1.8.rkllm 
rkllm init start
rkllm-runtime version: 1.0.1
rkllm init success

**********************可输入以下问题对应序号获取回答/或自定义输入********************

[0] 把下面的现代文翻译成文言文:到了春风和煦,阳光明媚的时候,湖面平静,没有惊涛骇浪,天色湖光相连,一片碧绿,广阔无际;沙洲上的鸥鸟,时而飞翔,时而停歇,美丽的鱼游来游去,岸上与小洲上的花草,青翠欲滴。
[1] 以咏梅为题目,帮我写一首古诗,要求包含梅花、白雪等元素。
[2] 上联: 江边惯看千帆过
[3] 把这句话翻译成中文:Knowledge can be acquired from many sources. These include books, teachers and practical experience, and each has its own advantages. The knowledge we gain from books and formal education enables us to learn about things that we have no opportunity to experience in daily life. We can also develop our analytical skills and learn how to view and interpret the world around us in different ways. Furthermore, we can learn from the past by reading books. In this way, we won't repeat the mistakes of others and can build on their achievements.
[4] 把这句话翻译成英文:RK3588是新一代高端处理器,具有高算力、低功耗、超强多媒体、丰富数据接口等特点

*************************************************************************

user: What's your name?
robot: As an AI language model, I don't have a name in the traditional sense, but I am called QianWen. How can I assist you today?

4.Others

For more demo and api usage, refer to RKLLM SDK routines and documentation

5.FAQs

Q1: Failed to convert model?

A1: Check the available RAM on your PC.Models with larger parameter sizes require more memory for conversion or execution. You might consider increasing the swapfile size or using a PC with more memory.

Q2: When loading model on the board report: E RKNN: [08:05:50.345] failed to convert handle(1120) to fd, ret: -1, errno: 24, errstr: Too many open files?

A2: Large language models typically open many files during runtime. Refer to the previous ulimit command.

Q3: Report error when use_gpu set true

A3: ln -s /usr/lib/aarch64-linux-gnu/libmali.so /usr/lib/aarch64-linux-gnu/libOpenCL.so