1. Introduction

In order to meet the user’s demand for system real-time, the official supports upgrading Linux to RTLinux based on the kernel of the SDK source code.Our RTlinux support has preempt and xenomai two versions, the following preempt version to test.

1.1. RTLinux system firmware support

The preempt version supports all rk356x and rk3588 series boards and releases corresponding firmware. If you need source code, please contact business.

1.2. Test real-time effects

Testing real-time performance requires cyclictest, which can be installed using apt.

apt update
apt install rt-tests

1.2.1. Test the real-time effect of RTLinux

To test with the aio-3568j, execute the following command to test the real-time response latency of each core:

sudo ./cyclictest -S -p 99 -m

T:0 thread with serial number 0; P:99 thread priority 99 ;C: 9397 Counter. Every time the time interval of the thread reaches, the counter is incremented by 1; I: 1000 The time interval is 1000 microseconds (us);Min: minimum delay (us); Act: last delay (us); Avg: average delay (us); Max: maximum delay (us).

The aio-3568j’s system runs several pressure test programs and Cyclictests simultaneously to test the maximum response delay of each core:

#Run three cpu stress threads, three io stress threads, and three memory stress threads.
stress --cpu 3 --io 3 --vm 3

Add network load at the same time:

#Simultaneous upstream and downstream testing using iperf.
iperf -c 192.168.1.220 -p 8001 -f m -i100 -d -t 800000

Finally, perform gpu pressure test:

#Run indefinitely, looping from the last benchmark back to the first.
glmark2-es2-wayland --run-forever

The following figure shows the results of a delayed test of three days using the above tutorial to create a stressful environment. As can be seen in the following figure:

  • T0 real-time thread test has a maximum latency of 85us, an average latency of 16us, and a minimum latency of 3us.

  • T1 real-time thread test has a maximum latency of 61us, an average latency of 12us, and a minimum latency of 3us.

  • T2 real-time thread test has a maximum latency of 52us, an average latency of 11us, and a minimum latency of 3us.

  • T3 real-time thread test has a maximum latency of 18us, an average latency of 4us, and a minimum latency of 3us.

Why is T0 the worst test result? Because arm has all SPI interrupts handled by cpu0 by default, the delay testing will be worse than other cores, and we are better off not tying real-time threads to cpu0 to run. Why is T3 so much better than the other 3 cores? Because CPU3 is removed from the kernel SMP balancing and scheduling algorithm by default, CPU3 is reserved for RT applications. _images/rtlinux_test.png

1.2.2. Cyclictest standard test

The threads option (-t) is used to specify the number of measurement threads that Cyclictest will use when detecting latency. Typically, running only one measurement thread per CPU on a system is a standard test scenario. The cpu on which the thread must execute can be specified with the affinity option (-a).

These options are critical to minimize the impact of running Cyclictest on the observed system. When using Cyclictest, it is important to ensure that only one measurement thread is executing at any given time. If the expected execution times of two or more Cyclictest threads overlap, then Cyclictest’s measurement will be affected by the latency caused by its own measurement thread. The best way to ensure that only one measurement thread executes at a given time is to execute only one measurement thread on a given CPU.

For example, if you want to profile the latency of three specific CPUs, specify that those CPUs should be used (with the -a option), and specify that three measurement threads should be used (with the -t option). In this case, to minimize Cyclictest overhead, make sure that the main Cyclictest thread collecting metrics data is not running on one of the three isolated CPUs. The affinity of the main thread can be set using the taskset program, as described below.

1.2.2.1. Reduce the impact of cyclictest when evaluating latency on an isolated set of CPUs

When measuring latency on a subset of CPUs, make sure the main Cyclictest thread is running on CPUs that are not being evaluated. For example, if a system has two CPUs and is evaluating latency on CPU 0, the main Cyclictest thread should be pinned on CPU 1. Cyclictest’s main thread is not real-time, but if it executes on the CPU being evaluated, it may have an impact on latency because there will be additional context switches. After starting Cyclictest, the taskset command can be used to restrict the main thread to execute on a certain subset of CPUs. For example, the latency test for CPU1 to CPU3:

#CPU1 to CPU3 run real-time programs, the main line runs on CPU0 (the cyclictest compiled by the board is required, otherwise the three real-time threads will only run on CPU1, CPU2 and 3 do not)
taskset -c 0 ./cyclictest -t3 -p99 -a 1-3

The taskset program can also be used to ensure that other programs running on the system do not affect latency on the isolated CPU. For example, to start the program top to see threads pinned to CPU 0, use the following command:

taskset --cpu 0 top -H -p PID
After #top is opened, click the f key, move the cursor to the P option, select the space, and then click q to exit, you can see which CPUs the real-time thread is running on.

1.3. Improve real-time strategy

1.3.1. Suppress console messages and disable memory overcommitment

#You can use the kernel parameter quiet to start the kernel, or suppress it after startup, as follows:
echo 1 > /proc/sys/kernel/printk

#Disable memory overcommit to avoid latency from Out-of-Memory Killer
echo 2 > /proc/sys/vm/overcommit_memory

1.3.2. Do not use a desktop or use a lightweight window manager

For better real-time, we do not recommend using a system with a desktop, as this will bring a big challenge to the CPU latency. It is recommended to use minimal ubuntu, your own QT program, etc. The rt firmware of the rk356x does not use a desktop by default, but instead uses the window manager weston, and the display protocol is Wayland.

1.3.2.1. Switch the X11 environment

If you need an X11 environment, you can switch to X11 manually.

sudo set_display_server x11
reboot
#sudo set_display_server weston #You can switch back to weston again and restart.

1.3.2.2. Start using the openbox window manager

Switching to an X11 environment uses the desktop by default, if you need to use a lightweight window manager.

In the/etc/lightdm/lightdm. Conf specified ession using openbox window manager:

cat /etc/lightdm/lightdm.conf.d/20-autologin.conf 
[Seat:*]
user-session=openbox
autologin-user=firefly

1.3.2.3. Run only your own X11 program

If you do not use the login manager to start the X display service, you can use xinit to start the Xorg display service manually.

When xinit and startx are executed, they look for ~/.xinitrc to run as a shell script to start the client program.

If ~/.xinitrc does not exist, startx will run the default /etc/x11/xinit/xinitrc (the default xinitrc starts a Twm, xorg-xclock, and Xterm environment).

Start by shutting down the lightdm service.

systemctl disable lightdm

Then start your own program using startx.

startx chromium

You can also modify the xinitrc file of the client specified by default startx, which runs Xorg by default.

vim /etc/X11/xinit/xinitrc
-------------------------------------------------------------
#!/bin/sh
  
# /etc/X11/xinit/xinitrc
#
# global xinitrc file, used by all X sessions started by xinit (startx)

# invoke global X session script
#. /etc/X11/Xsession
#chromium --window-size=1920,1080
chromium --start-maximized  

1.3.3. Binding core

Events with high real-time requirements are fixed to a certain core for processing, and systems and other events with low real-time requirements are concentrated on one core for processing. Events such as specific interrupts, real-time programs, etc. can be serviced by dedicated cores.

1.3.3.1. Task binding core

rt applications can be processed by a specific core that binds rt applications to cpu3.

taskset -c 3 rt_task

1.3.3.2. breaks the binding core

Because arm handles all peripheral interrupts entirely by cpu0, for important interrupts, you can bind interrupts to other cores after the system starts. For example, bind the eth0 interrupt to cpu2.

root@firefly:~# cat /proc/interrupts | grep  eth0
 38:   28600296          0          0          0     GICv3  64 Level     eth0
 39:          0          0          0          0     GICv3  61 Level     eth0

root@firefly:~# cat /proc/irq/38/smp_affinity_list 
0-3

root@firefly:~# echo 2 > /proc/irq/38/smp_affinity_list

root@firefly:~# cat /proc/irq/38/smp_affinity_list 
2
root@firefly:~# cat /proc/interrupts | grep  eth0
 38:   29009292          0      52859          0     GICv3  64 Level     eth0
 39:          0          0          0          0     GICv3  61 Level     eth0

1.3.4. Use the smp+amp scheme

For more real-time requirements, you can use amp solutions to achieve better real-time control.

The rk3568 supports amp (Asymmetric Multi-core architecture), which allows you to customize certain core running custom systems.

For example, 0 ~ 2 cores run kernel, 3 cores run rt-thread, etc. Supports Linux(Kernel-4.19, rt-kernel-4.19),

Baremetal(HAL) and RTOS(RT-Thread) can be combined with AMP construction forms.

Inter-kernel communication can be used to exchange information between different cores.