Table of Contents

Linux in Safety-Critical Systems

The SIL2LinuxMP project is an collaborative research project to provide procedures and methods to qualify Linux on a multi-core embedded platform at safety integrity level 2 (SIL2) according to IEC 61508 Ed 2.

Some more information on the SIL2LinuxMP project is available here:

http://www.osadl.org/SIL2LinuxMP.sil2-linux-project.0.html

http://events17.linuxfoundation.org/sites/events/files/slides/2017-10-24_ELCE-2017_Bulwahn_Safety-Critical-Linux_v1.2-presentation.pdf

https://www.youtube.com/watch?v=1eyJ6dAqMmg

Primary mentoring contact: Lukas Bulwahn, lukas.bulwahn at gmail.com, Nicholas Mc Guire, der.herr at hofr.at

Student Project Proposals

Tailoring clang compiler warnings with coccinelle scripts

Recently, some efforts by Google's kernel developers have made it possible to compile large parts of the Linux kernel with the clang/LLVM compiler. The clang compiler is known to include a number of static analysis methods that identify typical bug patterns and bad code smells, which the compiler indicates to the user by compiler warnings. However, when one compiles the current Linux kernel with clang and switches on to warn on certain warning classes, the compiler shows a tremendous large number of warnings.

The final goal is to find a proper setup that reduces the number of warnings with clang compiler with a suitable post-processing and filtering, such that Linux kernel developers can practically improve their submitted patches during the development with the reported findings of potential bugs, bug patterns and code smells.

A first analysis of specific cases of the clang warnings showed that many of the found warnings fall into a comparatively small set of classes that follow a very specific and characteristic syntactic pattern and hence, many warnings can be assessed once.

As a first step, one needs to determine the relevance of the type of clang warnings for the Linux kernel and its contribution to the avoidance of certain typical bug patterns. If the clang warning type in general considered relevant, the next step is to identify the typical set of subclasses of this warning. Then, we can write the syntactic patterns in coccinelle scripts to automatically group all warnings of the identified subclasses. As a last step, the subclasses can then be assessed to determine if they are generally false reports (so, an irrelevant subclass within the relevant class of warnings), or if this subclass points to code where a bug can be uncovered with a comparatively high probability. Once the different classes have been identified and assessed, further warnings can be automatically marked and these warnings can be presented to the developers weighted by the confidence and relevance of the warnings.

This way, the developers should have useful static analysis report for their patches and the kernel community can gradually improve the different parts of the Linux kernel during the continuous evolution of the overall source code. If time within the project permits, this approach can further be applied to finding of other static analysis tools, e.g., clang tidy or sparse.

Required Knowledge:

  1. required: C and coccinelle, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work, good analytical skills to understand why warnings are reported in certain source code parts, solid documentation skills, good English writing skills with clear precise style
  2. desired: knowledge of compiler construction and static analysis, python, bash

Benefit: The student project provides more evidence that kernel development follows a defined process and discussion on identified patterns shows that a controlled and informed process is followed to decide which code patterns should be generally followed and which are enforced by the different tools and their setup.

To work on this project, you should show that you are able to use all the needed tools, especially coccinelle, and that you can properly analyse Linux kernel source code and write a precise and well-written report on your finding. Details for what is required for a successful application is provided on request.

Symbolic kernel config build evaluator

Goal of this project is to develop a tool that describes the conditions when a source file is included in the build of the Linux kernel with a symbolic term, i.e., a boolean formula over the kernel configuration flags and further kernel build variables.

For example, the `kernel/Makefile` includes the following build instructions:

obj-$(CONFIG_FUTEX) += futex.o
ifeq ($(CONFIG_COMPAT),y)
obj-$(CONFIG_FUTEX) += futex_compat.o
endif

Here, the file `kernel/futex.c` is included in the build if and only if CONFIG_FUTEX is set. Similarly, the `kernel/futex_compat.c` is include if and only if CONFIG_FUTEX and CONFIG_COMPAT is set.

The developed tool should be able to analyse this and provide for this example an output that would look like this:

kernel/futex.c <- CONFIG_FUTEX
kernel/futex_compat.c <- CONFIG_COMPAT && CONFIG_FUTEX

To achieve this in perfection, it would require a Herculean task: As the kernel build system is make, which is already quite complex by itself and can further invoke arbitrary shell scripts, it would require to write the complete make build system and shell evaluation as symbolic evaluator. This would just much too complex and laborious considering the value of the information that shall be collected.

However, we are convinced that the Makefiles in the kernel tree have only a rather low number of relevant patterns that are relevant to obtain the information that we are interested in. Hence, the symbolic expression can be quite simply be determined by writing a rather simple symbolic evaluator that understands the most prominent patterns and ignores the parts in the Makefile that are not relevant for the symbolic expression.

This symbolic evaluator provides a first good approximation of that information to answer and address further questions. With an increasing number of patterns covered, the approximation of the tool becomes increasingly better. At any point of the development, the quality of the symbolic evaluation can be evaluated to its ground truth, defined by comparing the tool's symbolic prediction against actual kernel builds with a large number of concrete build configurations.

Furthermore, these boolean expressions can be combined with the constraints of the kconfig tool to find potential contradictions in the overall kernel build configuration setup.

Once, this tool works properly on the granularity, we could consider to extend that even further on precise information for each line within a file. Again, we would follow a similarly pragmatic approach parsing a small set of patterns to determine if certain lines are only included under certain build configuration conditions.

The result of this symbolic evaluation is useful in multiple ways:

  1. One can find out which files are actually never included in any kernel build. With over 25,000 files, a complex build configuration setup and its continuous rate of change, it is actually quite likely that the kernel repository contains such files due to mistakes in the build configuration.
  2. Kernel developers can quickly determine how they must configure the kernel config that make sure that the files they modified are included in their local test builds.
  3. Knowing the constraints on the kernel config, kernel maintainers can decide how much review is needed and from which developer reviews and tests especially relevant for this patch.
  4. For system maintainers, we already developed a tool that can determine which patches are relevant for a given kernel configuration. With the described tool here, we can extend this information to link which patch is relevant due to which selected kernel configuration in the given kernel configuration. This supports system maintainers to judge which impact an update of the Linux kernel could potentially have on the overall system, and which patches in an update have significant more potential to impact the system than others.

Required Knowledge:

  1. Required: Very good understanding of compiler construction and processing in compilers, e.g., parsing, syntax analysis, code transformations and static analysis.
  2. Required: Good knowledge of make and python
  3. Desired: Basic knowledge of the kernel build system
  4. Desired: Knowledge and experience with some functional programming language

To work on this project, you should show that you are able to use all the needed tools, and that you can implement a simple interpreter, e.g., you can write a parser, interpreter and static analyser for a small toy programming language. Details for what is required for a successful application is provided on request.

Apply Facebook Infer to Linux kernel source code

Task: Run Facebook Infer on the Linux kernel source code, write models in Facebook Infer to improve the analysis.

This is a quite challenging project; in the application, we expect that you have proved that you can run Facebook Infer on the complete Linux kernel source code and you obtained a first analysis result.

If you cannot run Facebook Infer on the complete Linux kernel source, you should prove that you understand why Facebook Infer fails on certain parts, suggest different alternative work arounds and solutions, and already applied the work arounds, so that you can run Facebook Infer on the kernel source code with a fully known, understood and limited number of work arounds.

The project proposal should include first technical steps that show how you write models in Infer.

Required Knowledge:

  1. Required: Very good understanding of static analysis
  2. Required: Very good knowledge of C, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work
  3. Required: Very good knowledge of make and python
  4. Required: Good knowledge and practical experience with OCaml
  5. Required: good analytical skills to understand why static analysis reports findings in certain source code parts, solid documentation skills, good English writing skills with clear precise style
  6. Desired: Basic knowledge of the kernel build system

Develop Methods for Tracking Tool Analysis Findings over Time

We use a number of tools, checkpatch.pl, coccinelle scripts, sparse, etc. and these tools report certain findings. While the valid ones are addressed by the kernel developers, the invalid tool findings are manually assessed and not acted upon. Over time with addressing the valid findings, the proportion of invalid findings increase compared to newly appearing valid findings, as invalid findings of those tools are not marked and tracked over the various versions.

In this GSoC project, the student should work out methods and tools to track the tool findings and make these tools useful in the Linux kernel community.

Required Knowledge:

  1. Required: Very good knowledge of C, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work
  2. Required: Very good knowledge of python
  3. Required: Good understanding of git
  4. Recommended: Some understanding of static analysis tools
  5. Recommended: Some understanding of coccinelle

Make Linux kernel community aware of tool findings

The Linux kernel community has a number of tools to ensure the quality of the continuous kernel development. Among these tools are coccinelle, sparse, checkpatch.pl, lock dependency validator, KASAN, syzkaller and many more.

In the GSoC project, the student should find suitable ways to make the Linux developers aware of the tools' findings. There are various way in which this could be implemented, e.g.:

  1. Setting up an infrastructure that runs those tools on patches provided on the mailing list and reports the findings back to the patch authors
  2. Including the tool findings in the elixir development service
  3. Providing means to tag and comment on tool findings in the distributed Linux kernel development

This project idea is quite wide and we expect the student to provide a more specific description of the task to tackle with some evidence that he/she will be able to implement the proposal.

Required Knowledge:

  1. Required: Very good knowledge of a suitable programming language, e.g., python
  2. Required: Good understanding of git
  3. Required: Good knowledge of C
  4. Recommended: Basic understanding how the kernel community works
  5. Recommended: Basic understanding of the kernel tools

Assess, Aggregate and Provide Detailed Statistics on Kernel Bug Fixes

Goal of this project is to create and gather detailed information on bugs in the Linux kernel.To obtain this information, the student will have to analyse commits of the Linux kernel stabilisation process in a large number by manual assessment.

The assessment should allow us to answer these kind of questions:

  1. Which type of problems are identified in the early stage of the stabilisation process?
  2. Which type of problems are identified in the later stages of the stabilisation process?
  3. How are problems identified, by review, by testing, by tools, by production use?

This project will require to create an ontology of bugs; this will require some work reading general literature and quite some effort assessing and understanding the bug fix commits. You will not mainly program new features, but you will learn a lot about the kernel code by looking and assessing bug fixes and trying to derive general statements from these observations.

Required Knowledge:

  1. Required: Very good knowledge of C, skill to READ AND UNDERSTAND source code in the Linux kernel in independent work
  2. Required: Very good understanding of software development processes and software development methods
  3. Required: Good understanding of git