2013 GSoC LSB projects

Go back to the main GSoC Linux Foundation page

Mailing list: lsb-discuss at lists dot linux-foundation dot org

IRC: #lsb on irc.freenode.net

Code License: GPL/BSD, specs: GNU FDL

Mentors: Alexey Khoroshilov (alexey dot khoroshilov at gmail dot com), Jeff Licquia, Vladimir Rubanov, Denis Silakov (dsilakov at gmail dot com).

Improve multi-version support in automatically generated LSB test suites

In addition to the specification text, LSB provides a set of tools and tests to help developing LSB-compliant products. At any moment of time, the LSB workgorup provides support several versions of the specification; this means, that for every supported version a corresponding set of tools should be provided.

For most of LSB software products we have separate branches in our source code repository corresponding to supported versions of the specification. However, support of several branches takes additional resources. It turned out that for some tools it is easier to add support of multiple LSB versions - that is, teach the tools to take target LSB version as a parameter and behave with respect to the given value.

This approach was found especially useful for those tools which are partially generated on the basis of LSB Database. Examples include LSB SDK (which allows to build applications meeting requirements of a certain LSB version), appchk (that checks if application uses libraries or binary symbols not included in certain LSB version) and a set of other tests.

The aim of this project is to add multiple version support to those database-driven test suites where this support is currently absent.

The obligatory set of test suites to process is like the following:

libchk (a test for distribution that ensures that all necessary libraries and symbols are provided)
dynchk (runtime test for applications that ensures that LSB functions are used correctly)
appchk-sh (static checker for application shell scripts verified shell constructs and external commands used)

More items can be added if student succeeds with the obligatory ones before the deadline.

'Multi-version' support for these tests means that they should know which libraries/symbols/modules/etc. are included in particular version of LSB. Such information is stored in the central LSB database, so this project will involve work with the databases.

Desired knowledge: Perl, Python, C, Shell, SQL

Mentor: Denis Silakov

Formalization and Checking of Correct Usage of Kernel Core API

By the moment, the framework contains around 20 rules that has already helped to find and fix more than 100 issues in device drivers of vanilla Linux kernel.

This project (or several ones) is aimed at extending number of rules supported by the framework.

We have a collection of several dozens rules written in informal way, but other rules can be used as well. Formalization process usually includes the following steps:

identification of all elements of kernel core API relevant to the rule;
writing a finite state machine like formal model that consists of
- model state (one or more C variables);
- transitions between states linked to appropriate events in device driver (usually calls of kernel core API);
- a set of error states (that represents misuse of kernel core API);
writing unit tests covering all the interfaces;
checking all drivers against the rule, analyzing results, reporting bugs and fixing the rule.

A notation used for formal representation of the rules is C code and AspectJ-like notation to merge the model code with original source code of device driver in appropriate places.

More than one application can be approved. Formalization of ~5 rules is expected from each student.

Desired knowledge: C, aspect-oriented programming, Linux kernel space API

Mentor: Alexey Khoroshilov, Vadim Mutilin

Generation of Environment Model for Verification of Multi-module Device Drivers

Linux Driver Verification (LDV) program is devoted to developing an integrated framework for device drivers verification and adopting leading edge research results to enhance quality of verification tools. Currently LDV tools are used to check Linux device drivers against several safety rules, each of them formally defines what is correct usage of a particular part of kernel core API. Another way of looking at the safety rules is that a rule represents one or more typical misuse of kernel core API.

By the moment, the framework contains around 20 rules that has already helped to find and fix more than 100 issues in device drivers of vanilla Linux kernel.

Before verification of device driver source code starts LDV tools generate an environment model for this source code. This model reflects actual interaction between a driver and kernel core that happens when they work in real environment.

At the moment the environment model is generated for that device driver source code that is compiled into a separate kernel module. The model is built on the basis of this source code and a configuration that describes several types of driver structures and provides patterns for other ones. This allows to generate rather accurate and complete environment models for all device drivers automatically in comparison with other verification systems that require either annotations for all callbacks of drivers or manual developing of environment models.

Although the most of device drivers can be verified in such the way, more thorough analysis requires to take into account those modules that are bound with an analyzed one. There are two reasons for that. First is that a device driver can consists of several kernel modules. Second is that often some common functionality of some device driver groups is extracted into separate generic modules. In these cases currently generated one-module environment models may be incomplete and incorrect that can lead to both false positives and false negatives during further verification.

This project aims to develop a new approach to generate environment models for multi-module device drivers. We suggest following works:

analysis of a device driver source code coverage for a given environment model;
discovering of Linux kernel modules interrelations;
extending the environment generator configuration to support multi-module device drivers;
implementing of the approach in the current environment generator;
verification of all device drivers against several (at least 10) rules with help of a new environment generator and analysis of results.

Desired knowledge: C, Linux kernel modules, device driver structures, Perl

Mentor: Alexey Khoroshilov, Vadim Mutilin

Automatic Construction of Kernel Core Model

LDV tools lacks information on the kernel core since the framework verifies just device drivers source code against formalized rules. This reduces an analysis quality (both false positives and false negatives can happen) and hinders some safety rules to be formalized.

This project is intended to develop tools that can construct the kernel core model in the automatic way. Following activities are assumed:

building a kernel call graph on the basis of its source and binary code and exploring it;
using lightweight static analysis tools (e.g., Coccinelle, Sparse, Smatch, etc.) to obtain annotations of kernel core interfaces;
developing a format to keep information on the kernel core model that can be easily reused for generation of different rules;
integration of developed tools with LDV tools;
developing several rules relying on kernel core model and analysis of verification results for the given rules as well as for existing rules.

Desired knowledge: C, Linux kernel internals, parsing techniques, static analysis, Perl

Mentor: Alexey Khoroshilov, Vadim Mutilin

Extend and Update LSB Core Test Suite

LSB Core tests [1,2] is the main part of LSB certification tests. Originally they were developed for POSIX (IEEE Std 1003.1-1990) and LSB 3.1. LSB specification was expanded since that time (http://linuxbase.org/navigator/browse/status.php). The goal of the project is to fix open issues in the tests, i.e. to fix issues appearing on modern Linux distributions and to accomodate all relevant changes in text of specifications, as well as to expand coverage of the test suite to newly added LSB Core interfaces.

VSX-PCTS was implemented using TET/VSXgen test harness. The OLVER test suite was developed using model-based testing technology UniTESK. The technology follows well-known Design by Contract approach to represent formal specifications. Program contracts consist of preconditions and postconditions of interface operations and data type invariants. The main advantage of program contracts is that they enable automatic construction of test oracles that check the conformance of the target system behavior to the specifications.

Traceability of formal specifications to text of original LSB/POSIX specifications is supported using the following ideas:

each elementary requirement in original specification obtains a unique identifier;
each of the requirements is explicitly linked to a fragment of text in the original specification;
all checks in preconditions and postconditions are marked by identifiers of corresponding requirements.

[1] http://bzr.linuxfoundation.org/loggerhead/lsb/devel/olver-core-tests/files
[2] http://bzr.linuxfoundation.org/loggerhead/lsb/devel/runtime-test/files/

Desired knowledge: C, SEC, model based testing

Mentor: Alexey Khoroshilov, Roman Zybin

Improving Performance of KernelStrider Tools

KernelStrider tools [1] collect data about the operation of Linux kernel modules (device drivers, file system modules, etc.) in runtime. Among other things, the tools gather information about the memory accesses and function calls that the target kernel modules make. The information is passed to the user space for further processing. For example, an offline data race detector (like ThreadSanitizer offline [2]) can be used to analyze these data and detect data races in the kernel modules.

KernelStrider is a part of KEDR Framework [3].

The amount of data being collected and passed to the user space by KernelStrider can be rather large. For “e1000” network driver, for example, from tens to hundreds of megabytes of data need to be handled per minute when the network activity is high. It can be even larger for the wireless networking stack. This may result in a high CPU load and overall performance degradation of the system. Sometimes, part of the data may even be lost if the user-space tools fail to process it in time.

The idea is to compress the data in the kernel before the output. This way, the data could be deflated by a factor of 5-15 at least. For example, when applied to a saved data file, gzip with the default settings compressed that file approx. by a factor of 10.

The Linux kernel already contains implementations of compression algorithms (e.g., lzo), they could be used here.

The user-space components of KernelStrider should also be enhanced to be able to process the compressed data.

If the data compression is implemented in the KernelStrider's output subsystem, this could significantly reduce the strain on system resources put by it as well as improve reliability of these tools (less data would be lost during the output).

[1] http://code.google.com/p/kernel-strider

[2] http://code.google.com/p/data-race-test/wiki/ThreadSanitizerOffline

[3] http://code.google.com/p/kedr

Desired knowledge: C, experience with Linux kernel modules

Mentors: Eugene Shatokhin, Vladimir Rubanov