In mainline Linux, RCU carries out significant processing in softirq context, during which preemption is disabled. This wiki page describes how to configure RCU to avoid this processing and thus the resulting latency degradation.
By default in mainline Linux, RCU callbacks are invoked in softirq context. These callbacks often free memory, and the memory allocators can therefore impose large latencies when they take their slowpaths. Although these latencies cannot be avoided, they can be directed to the CPUs of your choice through use of RCU callback offloading.
To offload callbacks, build your kernel with
To enable callback offloading on all CPUs, build with
If you wish to be more selective, specify a list of CPUS to be offloaded with the
rcu_nocbs kernel boot parameter.
rcu_nocbs=1,3-4 would enable callback offloading on CPUs 1, 3, and 4.
Note that offloading can be specified only at boot time, and cannot be changed
Each CPU with offloaded callbacks will have a group of
For example, CPU 1 would have
rcuob/1 (for RCU-bh),
rcuop/1 (for RCU-preempt),
rcuos/1 (for RCU-sched).
These kthreads can be assigned to specific CPUs and can be assigned scheduling priorities
There is of course no free lunch.
Use of RCU callback offloading means that
call_rcu() incurs greater overhead due to atomic operations, cache misses, and wakeups.
The wakeup overhead alone can result in tens of percent throughput degradation on some workloads, which is why Linux distributions default to no callback offloading.
This wakeup overhead can be shifted from the task invoking
call_rcu() to the
rcuo kthreads using the
rcu_nocb_poll kernel boot parameter, but at the expense of degraded energy efficiency due to the polling wakeups.
Note that care is required when assigning the
rcuo kthreads to specific CPUs, for example, placing all of these kthreads on a single CPU might overload that CPU, which could throttle callback invocation, potentially even OOMing the system.
Note that any
nohz_full CPU will also have its RCU callbacks offloaded. This mode of operation also gracefully handles CPU-bound real-time user-space threads.
One potential downside of preemptible RCU is that a low-priority task might be preempted in the middle of an RCU read-side critical section. If the system's higher-priority tasks consume all available CPU, that low-priority task might never resume, and thus might never leave its critical section. This in turn would prevent RCU grace periods from completing, eventually OOMing the system.
This normally indicates a design or configuration bug: Event-driven real-time applications should leave significant idle time in order to avoid queuing delays in the scheduler, among other things. This idle time would permit the low-priority task to proceed, in turn allowing grace periods to complete, thus avoiding OOM.
However, bugs can happen, including bugs involving infinite loops in high-priority real-time threads.
Debugging these problems is more difficult if the system keeps hanging due to OOM.
One way to ease debugging is to build with
CONFIG_RCU_BOOST=y, which by default will boost tasks blocking the current grace period for more than half a second to real-time priority level 1.
Additional Kconfig options
CONFIG_RCU_BOOST_DELAY provide additional control of RCU priority boosting.
Please see the Kconfig help text for more information.
Embedded systems sometimes have severe boot-time requirements, and RCU's grace-period delays can be a problem for these systems.
If so, using the
CONFIG_RCU_EXPEDITE_BOOT Kconfig option will cause RCU to expedite grace periods until
init is spawned, thus speeding up the early boot process.
In addition, the
rcupdate.rcu_normal sysfs parameters can be used to enable and disable expedited grace periods at runtime.
However, it is unwise to use too many expedited grace periods while an event-driven real-time application is running because expedited grace periods send IPIs to all non-idle CPUs (however, RCU considers
nohz_full CPUs to be idle, so CPU-bound real-time threads are not impeded by these IPIs).
rcupdate.rcu_normal sysfs parameter to completely disable RCU's expedited grace periods.
Note that this does not come for free: Some networking configuration operations run much more slowly when
rcupdate.rcu_normal is in effect.
-rt patchset contains a patch that causes RCU to substitute kthreads for most of its softirq execution.
This patch is not yet in mainline due to large performance degradation for some workloads.
It is hoped that mainline will gain this capability sooner rather than later, but it will need to be disabled by default for non-real-time builds/boots.
Although real-time kernel builds typically enable
CONFIG_PREEMPT_RCU=y by default, you should double-check this.
Failing to enable this Kconfig option can result in excessive latencies due to non-preemptible RCU read-side critical sections.