Table of Contents

RCU Configuration for Real-Time Systems

In mainline Linux, RCU carries out significant processing in softirq context, during which preemption is disabled. This wiki page describes how to configure RCU to avoid this processing and thus the resulting latency degradation.

RCU Callback Offloading

By default in mainline Linux, RCU callbacks are invoked in softirq context. These callbacks often free memory, and the memory allocators can therefore impose large latencies when they take their slowpaths. Although these latencies cannot be avoided, they can be directed to the CPUs of your choice through use of RCU callback offloading.

To offload callbacks, build your kernel with CONFIG_RCU_NOCB_CPU=y. To enable callback offloading on all CPUs, build with CONFIG_RCU_NOCB_CPU_ALL=y. If you wish to be more selective, specify a list of CPUS to be offloaded with the rcu_nocbs kernel boot parameter. For example, rcu_nocbs=1,3-4 would enable callback offloading on CPUs 1, 3, and 4. Note that offloading can be specified only at boot time, and cannot be changed at runtime.

Each CPU with offloaded callbacks will have a group of rcuo kthreads. For example, CPU 1 would have rcuob/1 (for RCU-bh), rcuop/1 (for RCU-preempt), and rcuos/1 (for RCU-sched). These kthreads can be assigned to specific CPUs and can be assigned scheduling priorities as desired.

There is of course no free lunch. Use of RCU callback offloading means that call_rcu() incurs greater overhead due to atomic operations, cache misses, and wakeups. The wakeup overhead alone can result in tens of percent throughput degradation on some workloads, which is why Linux distributions default to no callback offloading. This wakeup overhead can be shifted from the task invoking call_rcu() to the rcuo kthreads using the rcu_nocb_poll kernel boot parameter, but at the expense of degraded energy efficiency due to the polling wakeups. Note that care is required when assigning the rcuo kthreads to specific CPUs, for example, placing all of these kthreads on a single CPU might overload that CPU, which could throttle callback invocation, potentially even OOMing the system.

Note that any nohz_full CPU will also have its RCU callbacks offloaded. This mode of operation also gracefully handles CPU-bound real-time user-space threads.

RCU Priority Boosting

One potential downside of preemptible RCU is that a low-priority task might be preempted in the middle of an RCU read-side critical section. If the system's higher-priority tasks consume all available CPU, that low-priority task might never resume, and thus might never leave its critical section. This in turn would prevent RCU grace periods from completing, eventually OOMing the system.

This normally indicates a design or configuration bug: Event-driven real-time applications should leave significant idle time in order to avoid queuing delays in the scheduler, among other things. This idle time would permit the low-priority task to proceed, in turn allowing grace periods to complete, thus avoiding OOM.

However, bugs can happen, including bugs involving infinite loops in high-priority real-time threads. Debugging these problems is more difficult if the system keeps hanging due to OOM. One way to ease debugging is to build with CONFIG_RCU_BOOST=y, which by default will boost tasks blocking the current grace period for more than half a second to real-time priority level 1. Additional Kconfig options CONFIG_RCU_KTHREAD_PRIO and CONFIG_RCU_BOOST_DELAY provide additional control of RCU priority boosting. Please see the Kconfig help text for more information.

Expedited RCU Grace Periods

Embedded systems sometimes have severe boot-time requirements, and RCU's grace-period delays can be a problem for these systems. If so, using the CONFIG_RCU_EXPEDITE_BOOT Kconfig option will cause RCU to expedite grace periods until init is spawned, thus speeding up the early boot process. In addition, the rcupdate.rcu_expedited and rcupdate.rcu_normal sysfs parameters can be used to enable and disable expedited grace periods at runtime.

However, it is unwise to use too many expedited grace periods while an event-driven real-time application is running because expedited grace periods send IPIs to all non-idle CPUs (however, RCU considers nohz_full CPUs to be idle, so CPU-bound real-time threads are not impeded by these IPIs). Use the rcupdate.rcu_normal sysfs parameter to completely disable RCU's expedited grace periods. Note that this does not come for free: Some networking configuration operations run much more slowly when rcupdate.rcu_normal is in effect.

Real-Time, RCU, and softirqs

The -rt patchset contains a patch that causes RCU to substitute kthreads for most of its softirq execution. This patch is not yet in mainline due to large performance degradation for some workloads. It is hoped that mainline will gain this capability sooner rather than later, but it will need to be disabled by default for non-real-time builds/boots.

Preemptible RCU

Although real-time kernel builds typically enable CONFIG_PREEMPT_RCU=y by default, you should double-check this. Failing to enable this Kconfig option can result in excessive latencies due to non-preemptible RCU read-side critical sections.

References: