4. RTLinux manual

In order to meet the user’s demand for system real-time, the official supports upgrading Linux to RTLinux based on the kernel of the SDK source code.

4.1. Download RTLinux system firmware

RTLinux firmware link:

4.2. Test real-time effects

Testing real-time performance requires cyclictest, which can be installed using apt.

apt update
apt install rt-tests

4.3. Test the real-time effect of RTLinux

Run the following command to test 40 threads in real time.

#Add -a The real-time effect will be better, and the -n option needs to be added when the version is relatively low
sudo ./cyclictest -t 40 -p 99 -a -m

The screenshot below is the result of 40 real-time threads tests using the ITX-3588J, running the cpu, io, mem, and gpu pressure tests simultaneously. The results showed that the minimum delay in small nucleus (T0) was 5us, the average delay was 11us, and the maximum delay was 44us. The minimum delay in the large nucleus (T4) was 2us, the average delay was 3us, and the maximum delay was 27us.

In the current system, each core runs 5 real-time threads of the same level; In the standard test, a core runs a real-time thread, and because the same level of real-time threads interact with each other, the standard test results in Cyclictest are better than the following figure.

T:0 thread with serial number 0; P:99 thread priority 99 ;C: 9397 Counter. Every time the time interval of the thread reaches, the counter is incremented by 1; I: 1000 The time interval is 1000 microseconds (us);Min: minimum delay (us); Act: last delay (us); Avg: average delay (us); Max: maximum delay (us).


4.4. Some use parameter parsing:

-t: Specify how many real-time threads to run; for example, to run 20 real-time threads: -t20
-p: Specifies the running real-time thread priority; for example, the running thread has a priority of 80: -p80
-a: Set the real-time thread affinity core; for example, set the real-time thread to run on 1 to 3 CPUs: -a 1-3 (no parameters are added later, all cores are specified; the lower version specifies a single core or all cores)
-i: The base interval of the thread, the default is 1000us
-l; the interval between threads, the default is 500us
-m: Pin the program to memory to prevent the program from being called out of memory
-q: print the result only at the end of the program

4.5. Cyclictest standard test

The threads option (-t) is used to specify the number of measurement threads that Cyclictest will use when detecting latency. Typically, running only one measurement thread per CPU on a system is a standard test scenario. The cpu on which the thread must execute can be specified with the affinity option (-a).

These options are critical to minimize the impact of running Cyclictest on the observed system. When using Cyclictest, it is important to ensure that only one measurement thread is executing at any given time. If the expected execution times of two or more Cyclictest threads overlap, then Cyclictest’s measurement will be affected by the latency caused by its own measurement thread. The best way to ensure that only one measurement thread executes at a given time is to execute only one measurement thread on a given CPU.

For example, if you want to profile the latency of three specific CPUs, specify that those CPUs should be used (with the -a option), and specify that three measurement threads should be used (with the -t option). In this case, to minimize Cyclictest overhead, make sure that the main Cyclictest thread collecting metrics data is not running on one of the three isolated CPUs. The affinity of the main thread can be set using the taskset program, as described below.

4.5.1. Reduce the impact of cyclictest when evaluating latency on an isolated set of CPUs

When measuring latency on a subset of CPUs, make sure the main Cyclictest thread is running on CPUs that are not being evaluated. For example, if a system has two CPUs and is evaluating latency on CPU 0, the main Cyclictest thread should be pinned on CPU 1. Cyclictest’s main thread is not real-time, but if it executes on the CPU being evaluated, it may have an impact on latency because there will be additional context switches. After starting Cyclictest, the taskset command can be used to restrict the main thread to execute on a certain subset of CPUs. For example, the latency test for CPU1 to CPU3:

#CPU1 to CPU3 run real-time programs, the main line runs on CPU0 (the cyclictest compiled by the board is required, otherwise the three real-time threads will only run on CPU1, CPU2 and 3 do not)
taskset -c 0 ./cyclictest -t3 -p99 -a 1-3

The taskset program can also be used to ensure that other programs running on the system do not affect latency on the isolated CPU. For example, to start the program top to see threads pinned to CPU 0, use the following command:

taskset --cpu 0 top -H -p PID
After #top is opened, click the f key, move the cursor to the P option, select the space, and then click q to exit, you can see which CPUs the real-time thread is running on.

4.6. Test CPU latency in worst case

The test code below is the worst-case background pressure in the RT system.

time ./cyclictest -t50 -p 80 -i 10000 -n -l 100000000000 -d 86400 -a 3

You can also create a corresponding stress environment to test the CPU latency, for example: Use stress to create 100% CPU load on four cores.

#stress -c 4&
#Can also be fixed to the CPU, as follows:
taskset -c 0 stress -c 1&
taskset -c 1 stress -c 1&
taskset -c 2 stress -c 1&
taskset -c 3 stress -c 1&

It is also possible to use ping to create 100% CPU load.

taskset -c 0 /bin/ping -l 100000 -q -s 10 -f localhost &
taskset -c 1 /bin/ping -l 100000 -q -s 10 -f localhost &
taskset -c 2 /bin/ping -l 100000 -q -s 10 -f localhost &
taskset -c 3 /bin/ping -l 100000 -q -s 10 -f localhost &

For I/O loads can be made using tar. For example, download the linux source code git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-stable

cd /opt
while true; do taskset -c 0 tar cvzf test1.tgz ./linux-stable ; done &
while true; do taskset -c 1 tar cvzf test2.tgz ./linux-stable ; done &
while true; do taskset -c 2 tar cvzf test3.tgz ./linux-stable ; done &
while true; do taskset -c 3 tar cvzf test4.tgz ./linux-stable ; done &

generate network load.

# start the server
# start netperf connection
/usr/bin/netperf -H <IP_ADDR_OF_SERVER> -t TCP_STREAM -A 16K,16K -l 3600
# use browser
firefox http://www.intomail.net/stream.php

Select the pressure environment according to the needs, and then carry out the delay test.

4.7. Some suggestions for using rtlinux

4.7.1. Suppress console messages and disable memory overcommitment

#You can use the kernel parameter quiet to start the kernel, or suppress it after startup, as follows:
echo 1 > /proc/sys/kernel/printk

#Disable memory overcommit to avoid latency from Out-of-Memory Killer
echo 2 > /proc/sys/vm/overcommit_memory

4.7.2. Do not use a desktop or use a lightweight desktop

For better real-time, we do not recommend using a system with a desktop, as this will bring a big challenge to the CPU latency. It is recommended to use minimal ubuntu, your own QT program, etc.

4.7.3. Specific cores do specific things

Events with high real-time requirements are fixed to a certain core for processing, and systems and other events with low real-time requirements are concentrated on one core for processing. Events such as specific interrupts, real-time programs, etc. can be serviced by dedicated cores.

  • You can use isolcpus in the kernel startup parameter to remove cpus from the kernel SMP balancing and scheduling algorithm and use the removed cpus as RT applications: isolcpus=2,3

  • Since ARM handles all peripheral interrupts by CPU0, important interrupts can be bound to other cores after system startup.

# Find the corresponding interrupt number
cat /proc/interrupts/ | grep eth0
cat /proc/irq/xxx/smp_affinity_list
# Fixes the interrupt to the specified core
echo 2 > /proc/irq/xxx/smp_affinity_list
  • When the core has only one runnable task, you can turn off timers on the CPU and unload RCU callbacks. Kernel boot parameters added: nohz_full=2,3 rcu_nocbs=2,3

4.7.4. Using the chip’s MCU

For events with higher real-time requirements, it is recommended to use the on-chip MCU function for better real-time control.