User Tools

Site Tools


realtime:documentation:howto:applications:cpuidle

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
realtime:documentation:howto:applications:cpuidle [2018/03/13 01:11]
ramesh.thomas
realtime:documentation:howto:applications:cpuidle [2024/06/28 11:32] (current)
costa.shul [Reference] sysfs
Line 5: Line 5:
 A CPU idle state is a hardware feature to save power while the CPU is doing nothing. ​ Different architectures support different types of CPU idle states. ​ They vary in the degree of power savings, target residency and exit latency. ​ Target residency is the amount of time the CPU needs to be in that idle state to justify the power consumed to enter and exit that state. ​ Exit latency is the time the hardware takes to exit from that idle state. A CPU idle state is a hardware feature to save power while the CPU is doing nothing. ​ Different architectures support different types of CPU idle states. ​ They vary in the degree of power savings, target residency and exit latency. ​ Target residency is the amount of time the CPU needs to be in that idle state to justify the power consumed to enter and exit that state. ​ Exit latency is the time the hardware takes to exit from that idle state.
  
-CPU idle states in Intel CPUs are referred to as C states. Each C state has a name, starting from C0 until the maximum number of C states supported. C states are generally per core; however, a package can also enter a C state when all cores in the package enter a certain C state. ​ The CPU is in C0 when it is fully active and is put into any of the other C states when the kernel becomes idle.+[[https://​www.kernel.org/​doc/​html/​latest/​admin-guide/​pm/​intel_idle.html|CPU idle states in Intel CPUs]] are referred to as C states. Each C state has a name, starting from C0 until the maximum number of C states supported. C states are generally per core; however, a package can also enter a C state when all cores in the package enter a certain C state. ​ The CPU is in C0 when it is fully active and is put into any of the other C states when the kernel becomes idle.
  
 C states with higher numbers are referred to as “deeper C states.” ​ These states save more power but also have higher exit latencies. Typically the deeper the idle state, the more components are either turned off or voltage reduced. ​ Turning these components back on when the CPU wakes up from the deeper C states takes time. These delays can also vary depending on differences in platform components, kernel configurations,​ devices running, kernel operations around wake, state of caches and TLBs.  Also the kernel must lock interrupts to synchronize the turning on of components, clocks and updating the state of the scheduler. The delays can vary a lot. C states with higher numbers are referred to as “deeper C states.” ​ These states save more power but also have higher exit latencies. Typically the deeper the idle state, the more components are either turned off or voltage reduced. ​ Turning these components back on when the CPU wakes up from the deeper C states takes time. These delays can also vary depending on differences in platform components, kernel configurations,​ devices running, kernel operations around wake, state of caches and TLBs.  Also the kernel must lock interrupts to synchronize the turning on of components, clocks and updating the state of the scheduler. The delays can vary a lot.
 +The source: [[https://​elixir.bootlin.com/​linux/​latest/​source/​drivers/​idle/​intel_idle.c|intel_idle.c]]
  
 The following sections discuss how we can tune the system so that we can limit the power saving capabilities to the point where these variable latencies (jitter) are contained within the tolerance of the real-time application design. The following sections discuss how we can tune the system so that we can limit the power saving capabilities to the point where these variable latencies (jitter) are contained within the tolerance of the real-time application design.
 ===== Configurations to guard critical cores from interference ===== ===== Configurations to guard critical cores from interference =====
  
-It would help to understand some basic configurations used in a real-time application environment to help reduce interference into the cores that run the real-time applications. ​ These configurations are done in kernel boot parameters. ​ Real-time applications can be run in “mixed mode” where some cores run real-time applications referred to as “critical cores” while other cores run regular tasks. If not running in mixed mode then all the cores would be running real-time applications and some of the configurations discussed below may not be necessary.+It would help to understand some basic configurations used in a real-time application environment to help reduce interference into the cores that run the real-time applications. ​ These configurations are done in kernel boot parameters. ​ Real-time applications can be run in “mixed mode” where some cores run real-time applications referred to as “critical cores” while other cores run regular tasks. If not running in mixed mode then all the cores would be running real-time applications and some of the configurations discussed below may not be necessary. See [[realtime:​documentation:​howto:​tools:​cpu-partitioning:​start|CPU partitioning]] and [[https://​docs.kernel.org/​admin-guide/​kernel-parameters.html#​cpu-lists|cpu lists in The kernel'​s command-line parameters]] for details.
  
-Detailed documentation of kernel parameters can be found at https://www.kernel.org/​doc/​Documentation/admin-guide/kernel-parameters.txt+isolcpus=//list of critical cores// – 
 +isolate the critical cores so that the kernel ​scheduler will not migrate tasks from other cores into them.
  
-isolcpuscpu list.  Give the list of critical cores.  This will isolate ​the critical cores so that the kernel scheduler will not migrate tasks from other cores into them.+irqaffinity=//list of non-critical cores// – protect ​the critical cores from IRQs.
  
-irqaffinity=cpu list.  Give list of non-critical cores. This will protect ​the critical cores from IRQs.+rcu_nocbs=//list of critical cores// – stop RCU callbacks from getting called into the critical cores.
  
-rcu_nocbs=cpu list.  Give the list of critical cores. This stops RCU callbacks from getting called into the critical cores.+nohz=off – The kernel'​s “dynamic ticks” mode of managing scheduling-clock ticks is known to impact latencies while exiting CPU idle states. This option turns that mode off. 
 +Refer to [[https://​docs.kernel.org/​timers/​no_hz.html|NO_HZ:​ Reducing Scheduling-Clock Ticks]] for more information about this setting
  
-nohz=off.  The kernel'​s “dynamic ticks” mode of managing scheduling-clock ticks is known to impact latencies while exiting CPU idle states. This option turns that mode off. Refer to https://www.kernel.org/​doc/​Documentation/​timers/​NO_HZ.txt for more information about this setting.  +nohz_full=//list of critical cores// –  
- +this will activate [[realtime:​documentation:​howto:​tools:​ticklesskernel|dynamic ​ticks]] mode of managing scheduling-clock ticks. The cores in the list will not get scheduling-clock ticks if there is only a single task running or if the core is idle. The kernel should be built with [[https://​elixir.bootlin.com/​linux/​latest/​A/​ident/​CONFIG_NO_HZ_FULL|CONFIG_NO_HZ_FULL]] options enabled.
-nohz_full=cpu list. Give the list of critical cores.  This will enable "​adaptive ​ticks" ​mode of managing scheduling-clock ticks. The cores in the list will not get scheduling-clock ticks if there is only a single task running or if the core is idle. The kernel should be built with either the CONFIG_NO_HZ_FULL_ALL or CONFIG_NO_HZ_FULL options enabled.+
  
 ===== Power Management Quality of Service (PM QoS) ===== ===== Power Management Quality of Service (PM QoS) =====
  
-PM QoS is an infrastructure in the kernel that can be used to fine tune the CPU idle system governance to select idle states that are below a latency tolerance threshold. ​ It has both a user level and kernel level interface. ​ It can be used to limit C states in all CPUs system wide or per core.  The following sections explain the user level interface. ​ A detailed description of PM QoS can be found at https://www.kernel.org/​doc/​Documentation/​power/​pm_qos_interface.txt.+PM QoS is an infrastructure in the kernel that can be used to fine tune the CPU idle system governance to select idle states that are below a latency tolerance threshold. ​ It has both a user level and kernel level interface. ​ It can be used to limit C states in all CPUs system wide or per core.  The following sections explain the user level interface. ​Details: [[https://docs.kernel.org/​power/​pm_qos_interface.html|PM Quality Of Service Interface]].
 ==== Specifying system wide latency tolerance ==== ==== Specifying system wide latency tolerance ====
 You can specify system wide latency tolerance by writing a latency tolerance value in micro seconds into /​dev/​cpu_dma_latency. A value of 0 means disable C states completely. ​ An application can write a limitation during critical operations and then restore to default value by closing the file handle to that entry. You can specify system wide latency tolerance by writing a latency tolerance value in micro seconds into /​dev/​cpu_dma_latency. A value of 0 means disable C states completely. ​ An application can write a limitation during critical operations and then restore to default value by closing the file handle to that entry.
Line 80: Line 82:
  
 </​code>​ </​code>​
-===== Tools used to measure latencies ===== 
  
-Cyclictest ​is used to measure the latencies while turbostat is used to identify the C states that are selected ​and their residencies+**Note:** 
 +The per-core user interface was changed in version 4.16. Current RT Linux is 4.14. Pull in commits 704d2ce, 0759e80 ​and c523c68 from 4.16.
  
-A detailed description of cyclictest options can be found at http://​manpages.ubuntu.com/​manpages/​precise/​man8/​cyclictest.8.html+===== Tools used to measure latencies =====
  
-https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest+[[realtime:​documentation:​howto:​tools:​cyclictest:​start|Cyclictest]] is used to measure the latencies while [[https://manpages.debian.org/testing/linux-cpupower/turbostat.8.en.html|turbostat]] is used to identify the C states that are selected and their residencies.  
 +See [[https://​man.archlinux.org/​man/cyclictest.8.en|cyclictest manpage]].
  
-Some parameters ​are discussed below+Some parameters:
  
 -a – Set affinity to CPU running real-time workload -a – Set affinity to CPU running real-time workload
- 
--n – Use clock_nanosleep instead of posix interval timers 
  
 -h or –H – generate histogram. Takes a parameter to limit maximum latency to be captured -h or –H – generate histogram. Takes a parameter to limit maximum latency to be captured
Line 133: Line 134:
 CPU 3 is the critical core running real-time workloads. ​ It is isolated and protected as described above. CPU 3 is the critical core running real-time workloads. ​ It is isolated and protected as described above.
  
-At each point we can use turbostat to check the C states used in a CPU as follows:+At each point we can use [[https://​manpages.debian.org/​testing/​linux-cpupower/​turbostat.8.en.html|turbostat]] ​to check the C states used in a CPU as follows:
 <​code>​ <​code>​
 $turbostat --debug $turbostat --debug
Line 148: Line 149:
 Generate a graph from the histogram data using any graphing tool, for example, gnuplot. Generate a graph from the histogram data using any graphing tool, for example, gnuplot.
  
-The following example graph shows very high jitter.+The following example graph shows very high jitter:
  
-**//<<​image coming soon - waiting for "image upload"​ permission>>//​**+{{:​realtime:​documentation:​howto:​applications:​no_res.png?​300|}}
  
 //​**<<​Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. ​ Performance varies depending on system configuration.>>​**//​ //​**<<​Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. ​ Performance varies depending on system configuration.>>​**//​
Line 163: Line 164:
 Following is the graph generated from the histogram: Following is the graph generated from the histogram:
  
-**//<<​image coming soon - waiting for "image upload"​ permission>>//​**+{{:​realtime:​documentation:​howto:​applications:​safe_latency_constraint.png?​300|}}
  
 //​**<<​Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. ​ Performance varies depending on system configuration.>>​**//​ //​**<<​Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. ​ Performance varies depending on system configuration.>>​**//​
Line 177: Line 178:
 Following is the graph generated from the histogram: Following is the graph generated from the histogram:
  
-**//<<​image coming soon - waiting for "image upload"​ permission>>//​**+{{:​realtime:​documentation:​howto:​applications:​safe_idle_interval.png?​300|}}
  
 //​**<<​Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. ​ Performance varies depending on system configuration.>>​**//​ //​**<<​Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. ​ Performance varies depending on system configuration.>>​**//​
Line 199: Line 200:
 CPU topology plays an important role on how the processor utilizes the power saving capabilities of the different C states. Processors have multiple cores and the operating system groups logical CPUs within each core.  Each of these groupings has shared resources that can be turned off only when all the processing units in that group reach a certain C state. ​ If one logical CPU in a core can enter a deep C state but other logical CPUs are still running or at a lesser power saving C state, the CPU that can enter the deep state will be held at a less power saving state. ​ This is because if the shared resources are turned off, then the other CPUs that are still running, will not be able to run.  The same applies to package C states. ​ A package can enter a deep C state only when all the cores in that package enter a certain deep C state, when the package level components can be turned off. CPU topology plays an important role on how the processor utilizes the power saving capabilities of the different C states. Processors have multiple cores and the operating system groups logical CPUs within each core.  Each of these groupings has shared resources that can be turned off only when all the processing units in that group reach a certain C state. ​ If one logical CPU in a core can enter a deep C state but other logical CPUs are still running or at a lesser power saving C state, the CPU that can enter the deep state will be held at a less power saving state. ​ This is because if the shared resources are turned off, then the other CPUs that are still running, will not be able to run.  The same applies to package C states. ​ A package can enter a deep C state only when all the cores in that package enter a certain deep C state, when the package level components can be turned off.
  
-When designing a multi-core real-time application,​ assign tasks to a cluster of cores that can go idle at the same time.  This may require some static configuration and knowledge of processor topology. Tools like turbostat can be used to get an idea of the groupings.+When designing a multi-core real-time application,​ assign tasks to a cluster of cores that can go idle at the same time.  This may require some static configuration and knowledge of processor topology. Tools like [[https://​manpages.debian.org/​testing/​linux-cpupower/​turbostat.8.en.html|turbostat]] ​can be used to get an idea of the groupings.
  
 Another area to consider is cache optimization. ​ Deeper C states would cause caches and TLBs to be flushed. ​ Upon resume, the caches need to be reloaded for optimal performance. This reloading can cause latencies at places where it was not expected based on earlier calibrations. ​ This can be avoided by adding logic in the methods described above to also force the cache to get repopulated by critical memory regions. ​ As the application wakes up from deeper C states earlier than the approaching critical phase, it can access the memory regions it would need to reference in the critical phase, forcing them to get reloaded in the cache. ​ This cache repopulating technique can be incorporated into any general cache optimization scheme the real-time application may be using. The technique applies not only to C states but also to any situation where the cache must be repopulated. ​ Another area to consider is cache optimization. ​ Deeper C states would cause caches and TLBs to be flushed. ​ Upon resume, the caches need to be reloaded for optimal performance. This reloading can cause latencies at places where it was not expected based on earlier calibrations. ​ This can be avoided by adding logic in the methods described above to also force the cache to get repopulated by critical memory regions. ​ As the application wakes up from deeper C states earlier than the approaching critical phase, it can access the memory regions it would need to reference in the critical phase, forcing them to get reloaded in the cache. ​ This cache repopulating technique can be incorporated into any general cache optimization scheme the real-time application may be using. The technique applies not only to C states but also to any situation where the cache must be repopulated. ​
Line 205: Line 206:
 ===== Reference ===== ===== Reference =====
  
-Kernel parameters: ​https://​www.kernel.org/​doc/​Documentation/​admin-guide/​kernel-parameters.txt+[[https://​www.kernel.org/​doc/​html/latest/​admin-guide/​kernel-parameters.html#:​~:​text=cpuidle|Kernel parameters]] 
 + 
 +[[https://​www.kernel.org/​doc/​html/​latest/​timers/​no_hz.html|NO_HZ:​ Reducing Scheduling-Clock Ticks]] 
 + 
 +[[https://​www.kernel.org/​doc/​html/​latest/​power/​pm_qos_interface.html|PM Quality Of Service Interface]]
  
-Kernel scheduling tickshttps://​www.kernel.org/​doc/​Documentation/​timers/​NO_HZ.txt+[[realtime:documentation:howto:​tools:​cyclictest:​start|Cyclictest]]
  
-PM QoS: https://​www.kernel.org/​doc/​Documentation/power/pm_qos_interface.txt+[[https://​www.kernel.org/​doc/​html/latest/admin-guide/​kernel-per-CPU-kthreads.html|Reducing OS jitter due to per-cpu kthreads]]
  
-Cyclictest: ​https://wiki.linuxfoundation.org/​realtime/​documentation/​howto/​tools/cyclictest+[[https://books.google.com/books?​id=DFAnCgAAQBAJ&​pg=PA177&​lpg=PA177&​dq=c+state+latency+MSR&​source=bl&​ots=NLTLrtN4JJ&​sig=1ReyBgj1Ej0_m6r6O8wShEtK4FU&​hl=en&​sa=X&​ved=0ahUKEwifn4yI08vZAhUFwVQKHW1nDgIQ6AEIZzAH#​v=onepage&​q=c%20state%20latency%20MSR&​f=false|Good reference for C states]]
  
-Reducing OS jitter: ​https://git.kernel.org/pub/scm/linux/kernel/​git/​torvalds/​linux.git/​tree/​Documentation/​kernel-per-CPU-kthreads.txt?h=v4.14-rc2+[[https://manpages.debian.org/testing/linux-cpupower/cpupower-idle-info.1.en.html|cpupower idle-info]] - Utility to retrieve cpu idle kernel information
  
-Good reference for C states: https://books.google.com/books?​id=DFAnCgAAQBAJ&​pg=PA177&​lpg=PA177&​dq=c+state+latency+MSR&​source=bl&​ots=NLTLrtN4JJ&​sig=1ReyBgj1Ej0_m6r6O8wShEtK4FU&​hl=en&​sa=X&​ved=0ahUKEwifn4yI08vZAhUFwVQKHW1nDgIQ6AEIZzAH#​v=onepage&​q=c%20state%20latency%20MSR&​f=false ​+Sysfs''/​sys/​devices/​system/​cpu/cpu*/cpuidle/''​
  
- +Source: [[https://​git.kernel.org/​pub/​scm/​linux/​kernel/​git/​stable/​linux.git/​tree/​include/​linux/​cpuidle.h|include/​linux/​cpuidle.h]] 
 +[[https://​git.kernel.org/​pub/​scm/​linux/​kernel/​git/​stable/​linux.git/​tree/​drivers/​cpuidle|drivers/​cpuidle]]
  
realtime/documentation/howto/applications/cpuidle.1520903461.txt.gz · Last modified: 2018/03/13 01:11 by ramesh.thomas