User Tools

Site Tools


gsoc:2021-gsoc-kernel-workflows

Improving Kernel Workflows

Collect ideas for GSoC student projects on Improving Kernel Workflows here.

MAINTAINERS and correct integration tree information

In previous work on MAINTAINERS and process conformance, Pia Eichinger [1] has investigated: are patches integrated by the maintainers defined by the responsibilities in MAINTAINERS?

In this project, we are interested in a related (possibly simpler) question: Are the commits integrated into the appropriate integration trees referenced in MAINTAINERS?

The mentor believes a main difference between considering maintainers and integration trees is that the information in MAINTAINERS about integration trees is more erroneous, as it is not used as prominently as the personal maintainer information, name and email, with the wide-spread use of ./scripts/get_maintainer.pl. So, correcting those errors on integration trees in MAINTAINERS is more dominant (but also simpler) compared to correcting errors on personal maintainer information in MAINTAINERS.

The answer on the question above can then ultimately be used to identify which integration tree entries should be added to specific sections in MAINTAINERS to match best against the actual integration observed in git.

The factors and metric to determine what is best is of course the challenging task of identifying a suitable heuristics that is:

  1. good enough to be used to create a change to MAINTAINERS that is accepted by the community, and
  2. simple enough to be implemented with reasonable effort.

Background:

The MAINTAINERS section includes references, through the T: entries, to the location of a source configuration management (SCM) tree with its type, e.g., git, quilt, hg, For each commit, the kernel git history carries the commit's integration tree path, i.e., the information through with source configuration management (SCM) trees a commit was integrated until it was finally integrated into Linus Torvalds' tree.

Ideally the references in the MAINTAINERS sections are:

  • complete, i.e, all integration trees used for recent kernel releases are mentioned in MAINTAINERS.
  • sound, i.e., the majority of the commits are integrated through the trees referenced in the MAINTAINERS sections a patch belongs to.
  • precise, i.e., for each MAINTAINERS section, the majority of the commits that belong to a section are integrated through the tree referenced in that section.

Goal:

We identify and measure to these properties above, completeness, soundness and precision.

Then, we use that information to determine which integration tree entries should be added to which specific sections to maximally increase the three properties.

To evaluate the adequacy of this method, we can obtain feedback from the responsible kernel maintainers through proposing patches modifying the MAINTAINERS file, for the additions that we identified as most relevant (maximally increasing the properties, to a reasonable threshold of number of patch proposals [to not swamp maintainers initially] and a threshold on relevance [to not send out minor changes that are largely irrelevant to the community]).

In this project, we can make use of:

  • gitdm at git:/ /git.lwn.net/gitdm.git: gitdm includes some scripts to parse MAINTAINERS and obtain the integration tree patch of a commit.

and/or

  • pasta: Similarly to gitdm, pasta provides functionality to parse MAINTAINERS and some functionalities on extracting information on commits.

Potential project phases:

  1. In the first phase (PoC phase), we could probably just create a setup that combines or extends the functionality in gitdm and/or in pasta.
  2. In the second phase (MAINTAINERS patch creation phase), we send out some patches and collect feedback from maintainers.
  3. In a third phase, with a better understanding of the individual pieces in gitdm and/or in pasta, we could then create a cleaner design that also refactors gitdm and pasta to share the same implementation when essentially the same basic functionality is used within the various analyses.

Mentor contact: Lukas Bulwahn; lukas.bulwahn-at-gmail.com

References:

[1] https://lists.elisa.tech/g/devel/message/1269

Bidirectionally sync Patchwork patch status with Gmail labels

Many Linux kernel developers and maintainers use both Patchwork for patch tracking as well as Gmail or G Suite as their email provider. Patchwork is a working solution for storing patch review state, but some developers may prefer to view and modify patch state from their chosen mailer.

The Clk tree (https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/) does this today by syncing patchwork patch state to notmuch tags with a local cron job, and then using a popular tool (https://github.com/gauteh/lieer) to sync notmuch tags with Gmail Labels. This three-part solution allows the Clk team to stay synchronized on patch state without ever leaving the comfort of their chosen MUA. The downsides of this solution is that the cron job is ugly and is run locally on machines with questionable uptime.

A proposal for GSoC is to build a better mechanism to sync Patchwork patch status with Gmail Labels. This would grow the potential userbase beyond the current set of Patchwork+Gmail+Notmuch user into the larger set of Patchwork+Gmail users. Such a solution might be cloud-based, using tools such as Google Apps Script and the Patchwork REST API, as examples.

Dependencies: Gmail/G Suite-based email and Patchwork for patch state tracking Submitted by: Michael Turquette mturquette@baylibre.com (I'm happy to provide pointers to the current scripts used by the Clk team, hosted on github)

gsoc/2021-gsoc-kernel-workflows.txt · Last modified: 2021/01/27 06:41 by lukas.bulwahn