The Linux Foundation

Linux Weather Forecast/core kernel

From The Linux Foundation

This subsection of the Linux Platform Weather Forecast focuses on changes to the core part of the Linux kernel.


The Completely Fair Scheduler

CFS is a wholesale scheduler replacement written by Ingo Molnar. Like SD, it is focused on replacing complex interactivity heuristics with a much simpler, completely fair mechanism. CFS includes a framework for pluggable low-level scheduling modules and the beginnings of a "CPU time economy" scheme allowing processes to pay out some of the CPU time to which they are entitled to other processes performing work for them.

Forecast: CFS was merged in 2.6.23, and group scheduling was added for 2.6.24. The 2.6.25 series saw the addition of the realtime group scheduling feature which, among other things, allows administrators to safely allow unprivileged processes to use the realtime scheduling classes with risking total loss of the machine. In 2.6.26 the realtime group scheduling feature work is being completed, bringing this capability to full readiness.

For more information:

Real-time preemption

The real-time preemption patch set seeks to provide deterministic response times with a stock Linux kernel. It works by making everything preemptable, including code (spinlock-protected critical sections, interrupt handlers) which cannot be preempted in current kernels.

Forecast: much of the real-time preemption patch set has already found its way into the mainline. The most controversial changes remain out of tree, though; in particular, preemptible spinlocks remain outside of the mainline. The real-time developers have plans to merge much of the remaining work over the course of the next year. It looked like threaded interrupt handlers might be accepted for 2.6.28, but that didn't happen; look for another attempt to get that work in when the 2.6.29 merge window opens.

For more information: there are a number of articles covering the evolution of this patch set, including:


vringfd() is another attempt to create a ring-buffer API suitable for fast transfer of data between kernel and user space. This code comes from the virtualization arena, where it was first used to communicate between virtual network devices.

Forecast: this code is in an early state, and it is not clear whether it will be proposed for mainline inclusion or not. If it does go forward, the earliest we would see it would be 2.6.27.

For more information: vringfd() (April, 2008)

Memory fragmentation avoidance

The Linux virtual memory system tends to fragment physical memory over time. This fragmentation is not normally a problem, but it can get in the way when large, physically-contiguous chunks of memory are required. On highly-fragmented systems, multi-page allocations can fail, leading to degraded system functionality.

There's a few developments which address the fragmentation problem. The most prominent are:

  • Lumpy reclaim, which emphasizes reclaiming physically-contiguous pages
  • Grouping of memory allocations so that those which can be moved are kept separate from those which cannot be moved.

Both techniques can make it easier for the kernel to satisfy multi-page contiguous allocations. The grouping patches, in addition, are useful for memory hotplugging, which, in turn, is a feature which can be used by virtualization solutions.

Forecast: Basic fragmentation avoidance patches (the ZONE_MOVEABLE memory zone) and lumpy reclaim were merged for 2.6.23. Further work on active fragmentation avoidance may be merged in future kernels.

For more information:

Syslets and threadlets

"Syslets" are a means for running small programs within the kernel; they are a way to run system calls asynchronously and without exiting to user space in between. "Threadlets" are a similar mechanism for running asynchronous code in user space. In either case, the code in question will run synchronously as long as it does not block. If it must wait for something, the kernel creates a new thread (or reuses an existing, spare thread) and continues user-space execution in that thread.

The initial motivation for this patch was to enable a complete asynchronous I/O implementation without the heavy maintenance overhead of the current AIO approach. Since syslets and threadlets allow any system call to be run asynchronously, however, they have a wider application than that.

Forecast: This code was developed by Ingo Molnar, who has a long history of getting major changes into the kernel. So syslets and threadlets will likely make it in. My prediction is that the process will not be too fast, however - 2.6.27 at the earliest, and later than that would not be surprising.

For more information:

Big Kernel Lock

The Big Kernel Lock (BKL) was first introduce in the 2.0 kernel as a way to minimize concurrency and make basic SMP functionality work. Ever since then, the kernel developers have been working to squeeze the BKL out of the kernel as it poses an ongoing scalability problem. Recent difficulties have added a new urgency to this effort, with the result that more time is going into the BKL-removal task.

Forecast: A significant set of BKL-removal patches is queued for merging into the 2.6.27 kernel. The problem is far from solved, though, and the BKL is likely to remain with us for at least another year, though always in a smaller way.

For more information:


This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

[Article] [Discussion] [View source] [History]