User Tools

Site Tools


realtime:documentation:hr_timers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
realtime:documentation:hr_timers [2016/12/28 00:27]
ayates1 created
— (current)
Line 1: Line 1:
-Further information can be found in the paper of the OLS 2006 talk "​hrtimers 
-and beyond"​. The paper is part of the OLS 2006 Proceedings Volume 1, which can 
-be found on the OLS website: 
-[[https://​www.kernel.org/​doc/​ols/​2006/​ols2006v1-pages-333-346.pdf]] 
  
-The slides to this talk are available from: 
-official http://​tglx.de/​projects/​hrtimers/​ols2006-hrtimers.pdf 
-[[http://​www.cs.columbia.edu/​~nahum/​w6998/​papers/​ols2006-hrtimers-slides.pdf]] ​ 
- 
-The slides contain five figures (pages 2, 15, 18, 20, 22), which illustrate the 
-changes in the time(r) related Linux subsystems. Figure #1 (p. 2) shows the 
-design of the Linux time(r) system before hrtimers and other building blocks 
-got merged into mainline. 
- 
-Note: the paper and the slides are talking about "clock event source",​ while we 
-switched to the name "clock event devices"​ in meantime. 
- 
-======The hrtimer base infrastructure====== 
- 
-The hrtimer base infrastructure was merged into the 2.6.16 kernel. Details of 
-the base implementation are covered in Documentation/​timers/​hrtimers.txt. See 
-also figure #2 (OLS slides p. 15) 
- 
-The main differences to the timer wheel, which holds the armed timer_list type 
-timers are: 
-       - time ordered enqueueing into a rb-tree 
-       - independent of ticks (the processing is based on nanoseconds) 
- 
- 
-======timeofday and clock source management====== 
- 
-John Stultz'​s Generic Time Of Day (GTOD) framework moves a large portion of 
-code out of the architecture-specific areas into a generic management 
-framework, as illustrated in figure #3 (OLS slides p. 18). The architecture 
-specific portion is reduced to the low level hardware details of the clock 
-sources, which are registered in the framework and selected on a quality based 
-decision. The low level code provides hardware setup and readout routines and 
-initializes data structures, which are used by the generic time keeping code to 
-convert the clock ticks to nanosecond based time values. All other time keeping 
-related functionality is moved into the generic code. The GTOD base patch got 
-merged into the 2.6.18 kernel. 
- 
-Further information about the Generic Time Of Day framework is available in the 
-OLS 2005 Proceedings Volume 1: 
-https://​www.kernel.org/​doc/​ols/​2005/​ols2005v1-pages-227-240.pdf 
- 
-The paper "We Are Not Getting Any Younger: A New Approach to Time and 
-Timers"​ was written by J. Stultz, D.V. Hart, & N. Aravamudan. 
- 
-Figure #3 (OLS slides p.18) illustrates the transformation. 
- 
- 
-======clock event management====== 
-While clock sources provide read access to the monotonically increasing time 
-value, clock event devices are used to schedule the next event 
-interrupt(s). The next event is currently defined to be periodic, with its 
-period defined at compile time. The setup and selection of the event device 
-for various event driven functionalities is hardwired into the architecture 
-dependent code. This results in duplicated code across all architectures and 
-makes it extremely difficult to change the configuration of the system to use 
-event interrupt devices other than those already built into the 
-architecture. Another implication of the current design is that it is necessary 
-to touch all the architecture-specific implementations in order to provide new 
-functionality like high resolution timers or dynamic ticks. 
- 
-The clock events subsystem tries to address this problem by providing a generic 
-solution to manage clock event devices and their usage for the various clock 
-event driven kernel functionalities. The goal of the clock event subsystem is 
-to minimize the clock event related architecture dependent code to the pure 
-hardware related handling and to allow easy addition and utilization of new 
-clock event devices. It also minimizes the duplicated code across the 
-architectures as it provides generic functionality down to the interrupt 
-service handler, which is almost inherently hardware dependent. 
- 
-Clock event devices are registered either by the architecture dependent boot 
-code or at module insertion time. Each clock event device fills a data 
-structure with clock-specific property parameters and callback functions. The 
-clock event management decides, by using the specified property parameters, the 
-set of system functions a clock event device will be used to support. This 
-includes the distinction of per-CPU and per-system global event devices. 
- 
-System-level global event devices are used for the Linux periodic tick. Per-CPU 
-event devices are used to provide local CPU functionality such as process 
-accounting, profiling, and high resolution timers. 
- 
-The management layer assigns one or more of the following functions to a clock 
-event device: 
-      - system global periodic tick (jiffies update) 
-      - cpu local update_process_times 
-      - cpu local profiling 
-      - cpu local next event interrupt (non periodic mode) 
- 
-The clock event device delegates the selection of those timer interrupt related 
-functions completely to the management layer. The clock management layer stores 
-a function pointer in the device description structure, which has to be called 
-from the hardware level handler. This removes a lot of duplicated code from the 
-architecture specific timer interrupt handlers and hands the control over the 
-clock event devices and the assignment of timer interrupt related functionality 
-to the core code. 
- 
-The clock event layer API is rather small. Aside from the clock event device 
-registration interface it provides functions to schedule the next event 
-interrupt, clock event device notification service and support for suspend and 
-resume. 
- 
-The framework adds about 700 lines of code which results in a 2KB increase of 
-the kernel binary size. The conversion of i386 removes about 100 lines of 
-code. The binary size decrease is in the range of 400 byte. We believe that the 
-increase of flexibility and the avoidance of duplicated code across 
-architectures justifies the slight increase of the binary size. 
- 
-The conversion of an architecture has no functional impact, but allows to 
-utilize the high resolution and dynamic tick functionalities without any change 
-to the clock event device and timer interrupt code. After the conversion the 
-enabling of high resolution timers and dynamic ticks is simply provided by 
-adding the kernel/​time/​Kconfig file to the architecture specific Kconfig and 
-adding the dynamic tick specific calls to the idle routine (a total of 3 lines 
-added to the idle function and the Kconfig file) 
- 
-Figure #4 (OLS slides p.20) illustrates the transformation. 
- 
- 
-======high resolution timer functionality====== 
- 
-During system boot it is not possible to use the high resolution timer 
-functionality,​ while making it possible would be difficult and would serve no 
-useful function. The initialization of the clock event device framework, the 
-clock source framework (GTOD) and hrtimers itself has to be done and 
-appropriate clock sources and clock event devices have to be registered before 
-the high resolution functionality can work. Up to the point where hrtimers are 
-initialized,​ the system works in the usual low resolution periodic mode. The 
-clock source and the clock event device layers provide notification functions 
-which inform hrtimers about availability of new hardware. hrtimers validates 
-the usability of the registered clock sources and clock event devices before 
-switching to high resolution mode. This ensures also that a kernel which is 
-configured for high resolution timers can run on a system which lacks the 
-necessary hardware support. 
- 
-The high resolution timer code does not support SMP machines which have only 
-global clock event devices. The support of such hardware would involve IPI 
-calls when an interrupt happens. The overhead would be much larger than the 
-benefit. This is the reason why we currently disable high resolution and 
-dynamic ticks on i386 SMP systems which stop the local APIC in C3 power 
-state. A workaround is available as an idea, but the problem has not been 
-tackled yet. 
- 
-The time ordered insertion of timers provides all the infrastructure to decide 
-whether the event device has to be reprogrammed when a timer is added. The 
-decision is made per timer base and synchronized across per-cpu timer bases in 
-a support function. The design allows the system to utilize separate per-CPU 
-clock event devices for the per-CPU timer bases, but currently only one 
-reprogrammable clock event device per-CPU is utilized. 
- 
-When the timer interrupt happens, the next event interrupt handler is called 
-from the clock event distribution code and moves expired timers from the 
-red-black tree to a separate double linked list and invokes the softirq 
-handler. An additional mode field in the hrtimer structure allows the system to 
-execute callback functions directly from the next event interrupt handler. This 
-is restricted to code which can safely be executed in the hard interrupt 
-context. This applies, for example, to the common case of a wakeup function as 
-used by nanosleep. The advantage of executing the handler in the interrupt 
-context is the avoidance of up to two context switches - from the interrupted 
-context to the softirq and to the task which is woken up by the expired 
-timer. 
- 
-Once a system has switched to high resolution mode, the periodic tick is 
-switched off. This disables the per system global periodic clock event device - 
-e.g. the PIT on i386 SMP systems. 
- 
-The periodic tick functionality is provided by an per-cpu hrtimer. The callback 
-function is executed in the next event interrupt context and updates jiffies 
-and calls update_process_times and profiling. The implementation of the hrtimer 
-based periodic tick is designed to be extended with dynamic tick functionality. 
-This allows to use a single clock event device to schedule high resolution 
-timer and periodic events (jiffies tick, profiling, process accounting) on UP 
-systems. This has been proved to work with the PIT on i386 and the Incrementer 
-on PPC. 
- 
-The softirq for running the hrtimer queues and executing the callbacks has been 
-separated from the tick bound timer softirq to allow accurate delivery of high 
-resolution timer signals which are used by itimer and POSIX interval 
-timers. The execution of this softirq can still be delayed by other softirqs, 
-but the overall latencies have been significantly improved by this separation. 
- 
-Figure #5 (OLS slides p.22) illustrates the transformation. 
- 
- 
-======dynamic ticks====== 
- 
-Dynamic ticks are the logical consequence of the hrtimer based periodic tick 
-replacement (sched_tick). The functionality of the sched_tick hrtimer is 
-extended by three functions: 
- 
-- hrtimer_stop_sched_tick 
-- hrtimer_restart_sched_tick 
-- hrtimer_update_jiffies 
- 
-hrtimer_stop_sched_tick() is called when a CPU goes into idle state. The code 
-evaluates the next scheduled timer event (from both hrtimers and the timer 
-wheel) and in case that the next event is further away than the next tick it 
-reprograms the sched_tick to this future event, to allow longer idle sleeps 
-without worthless interruption by the periodic tick. The function is also 
-called when an interrupt happens during the idle period, which does not cause a 
-reschedule. The call is necessary as the interrupt handler might have armed a 
-new timer whose expiry time is before the time which was identified as the 
-nearest event in the previous call to hrtimer_stop_sched_tick. 
- 
-hrtimer_restart_sched_tick() is called when the CPU leaves the idle state before 
-it calls schedule(). hrtimer_restart_sched_tick() resumes the periodic tick, 
-which is kept active until the next call to hrtimer_stop_sched_tick(). 
- 
-hrtimer_update_jiffies() is called from irq_enter() when an interrupt happens 
-in the idle period to make sure that jiffies are up to date and the interrupt 
-handler has not to deal with an eventually stale jiffy value. 
- 
-The dynamic tick feature provides statistical values which are exported to 
-userspace via /proc/stats and can be made available for enhanced power 
-management control. 
- 
-The implementation leaves room for further development like full tickless 
-systems, where the time slice is controlled by the scheduler, variable 
-frequency profiling, and a complete removal of jiffies in the future. 
- 
- 
-Aside the current initial submission of i386 support, the patchset has been 
-extended to x86_64 and ARM already. Initial (work in progress) support is also 
-available for MIPS and PowerPC. 
- 
-Thomas, Ingo 
realtime/documentation/hr_timers.1482884834.txt.gz ยท Last modified: 2016/12/28 00:27 by ayates1