The Linux Foundation

 
Packaging/Wiki

From The Linux Foundation

Contents

Introduction

In Linux, there is no widely agreed upon methodology for distributing third-party software packages. The most common way to distribute binary packages is to pick several "base" distributions to build against and provide packages for them. There are several efforts existing for packaging third-party software, but none has significant traction from all stake-holders. An approach that attempts to address needs from all stake-holders is much more likely to gain widespread adoption.

Stakeholders

For the purpose of this writing, we identify the following as the key stakeholders that need pleasing: Users, Developers, Distros, and IT administrators. Of course, in many cases there is no IT administrator. Each stakeholder has needs that are not met by current implementations as well as some precepts of their own domain that must not be violated.

For Users, there is an unmet need to install/remove/upgrade/configure software easily and consistently. While each distro generally provides a plethora of software to choose from as well as a methodology for doing the above, once a user goes beyond what the distro offers, they are on their own.

Distros, in particular commercial distros, historically feel that they are the solution, and if everybody would just package their software for their distro, the problem would be solved. However, distro perspectives have become more aware to the realities of the world and are generally supportive of efforts to unify certain aspects of distros, such as LSB. From a distro perspective, it is critical that third-party software packages and packaging systems do not impeded with the functionality of the underlying distro.

IT administrators need tools to manage the software installed on their systems. Distros and third-party tools provide many ways to query and manipulate packages on systems. However, most third-party packaging systems do not interact with these systems, or worse, allow them to provide incorrect information/function.

Developers need a consistent way to make software available for Linux systems. Making binaries available for every type of Linux system is intractable. Furthermore, without a consistent way to provide updates or configuration, it is increasingly common for developers to have to build these facilities directly into their applications. This presents a user nightmare that is already apparent in Windows, where applications routinely produce pop-ups telling the user that the application has been updated. Many third party applications include user-registration and EULA click-through when the application is first installed. A properly done packaging system should provide these facilities for applications to utilize.

Current Solutions

While there is no current solution in the market, it is helpful to examine the current practices for insight into possible solutions.

Distro Solution

One approach to the multiplicity of distros is to create a software package for each popular distro. While, this allows for the same uniformity as with the distro's set of packages, it has several shortcomings for the user. Because the Linux user-base is so fragmented by distro, no software provider can provide packages for every distro. While some distros make it easy to install a package by clicking on a link from a web-browser, none allow you to easily add a repository, which would enable updates to be processed. Adding repositories in most distros is a non-trivial exercise that is not for the weak or infirm. Of course, a software maker multiplies their development and QA efforts for each distro supported.

LSB Solution

LSB defines RPM with several restrictions as the preferred way of distributing packages. There are restrictions on naming, features, virtual packages, and dependencies. Because RPM is not the package format on many distributions, a transformer called alien is used to convert to other formats. Alien operates at the lowest common denominator of package transformation, and thus indirectly imposes it's own limits. For example, the compression format must be gzip, as bzip2 is not part of Debian's base. Of course, since there is no de-facto upstream rpm for the distros, each rpm implementation is slightly different. Even details such as if stdin and stdout are left open vary across distros.

The specification doesn't provide a standard way to install/remove/upgrade rpm packages, the distro methods are used. rpm is usually a low-level package tool, with a higher level tool that uses it such as rug, yum, yast, etc. Thus, the methodology for install/remove/upgrade is distro-specific. Furthermore, each higher level packaging system usually has features on top of rpm such as update checking and some degree of configuration. Because these are at a level above, they cannot be specified as standard.

LSB has restrictions on dependencies and virtual packages. As specified, a package cannot depend on a package that is not part of LSB or another vendor. Likewise, a package cannot provide something that is not registered to it. While it is a gap not to be able to do these things, because each distro does naming differently, it is unlikely that static names for dependencies and virtual packages would be sufficient. Imagine the case of a sendmail replacement. On debian, it should provide the mail-transport-agent virtual package, while on Fedora it should provide the smtp-server virtual package. Similar situations exist for dependencies.

As each distros implementation of certain LSB facilities is different, the package cannot be sure as to what files are generated on it's behalf. For example, there is a LSB facility for creating init scripts on the package's behalf. However, the files that are generated are undefined. It is possible and likely that the generated files will vary across distros. Because RPM does not have a flexible manifest (ie, one cannot programmatically add and remove files from the manifest), these generated files are orphaned. Of course, on package removal, these scripts can be programmatically removed. Of course, it takes a careful programmer not to create orphaned files in such a way. This is a simple example, but there are much more complex possibilities.

Further Reading

Autopackage

Autopackage is a packaging system for third-party software that is intended to be distribution neutral. Autopackage provides a consistent user experience to install/remove(check remove part) software. Autopackage allows for the installation of packages as a non-root user in one's home directory, but this is not guaranteed to always be available for every package. There are roughly 62 packages that are available in the autopackage format.

Basic Architecture

Autopackage provides a framework for creating/installing/removing packages rather than a complete packaging system. This framework is accessed as a library of shell script calls. The autopackage format itself is, in fact, a shell script that is generated by the tool-chain. The spec-file sections are also shell scripts, so things like dependency checking can be arbitrarily complicated. However, the standard library of shell utilities is generally used for dependency checking. Dependencies are generally by file name or so-name. There is no concept of conflicting packages, although that is envisioned for the future. One notable point of the autopackage philosophy is the eschewing of any sort of repository system. Autopackage's are files and all meta-data about them is encapsulated in the package. If, for example, a package has a dependency and knows how to resolve it by fetching another package, the second package URL is part of the first package. Thus, if the second package moves, the first package is broken.

Community

There are nominally six contributors to the autopackage core, but it seems that only four are currently active. The list of packages at the main page has 65 packages, of which 3 are part of autopackage and at least two others are defunct.

The autopackage community has had trouble with the distro community for a variety of reasons. There are technical differences of opinion between the two groups about things such as FHS compliance, the role of packaging in general, and the suitability of certain languages for production applications. Because autopackage can, and by default will, put things in the /usr hierarchy, it can interfere with the distro's packaging system. Some people fear that this will result in broken behavior for the user. The autopackage development team says that they do this because nearly all distros provide broken support for /usr/local.

Roadmap

The next major release of autopackage will be the 1.2 series. Outside of better C++ support, the major feature that will be present is the "integration" with the host packaging system. In short, this integration consists of autopackage being able to remove a host package if there is a perceived conflict (since autopackage doesn't have meta-data for conflicts, this is highly theoretical). Sometime after that (perhaps the 1.4 series), the autopackage team wants to provide some sort of update notification system. There is already schema in the meta-data to provide a URL to check.

Benefits/Deficiencies

Autopackage attempts to address the need of simply and consistently installing/removing third party packages. Almost all autopackage packages are binary relocatable, which allows them to be installed in a user's home directory instead of required root to install all binary packages. Additionally, the autopackage team makes sure that the UI presents the user with the easiest possible interface. For example, most distros use the package name when referring to it for installation and removal. This name could be something like gaim. Autopackage makes sure to always show the short-description of the package for all transactions. They do not rely on the user to have to remember what gaim is when answering questions about it. However, there are several core problems with the autopackage approach and community.

Firstly, autopackage and the autopackage community have an openly hostile attitude towards the distros. The lack of good will towards a project that in any sense is doing something controversial from a distro perspective makes the likelihood of a cozier relationship slim, especially if autopackage were to further encroach on distro package management.

Autopackage packages are shell scripts, and are thus executable. This is a major fault, because directly executable packages can never be trusted. Although a signing methodology is in place, defeating it is a trivial exercise. Most security-conscious ISVs will not distribute their software in such an obviously vulnerable format.

It is unclear how well their dependency resolution method can scale when the number of packages is so small. At the moment, very few packages have dependencies, so resolving them takes very little time. All of the logic for doing so is written in shell script, so one can imagine that a large number of packages will make dependency solving very slow. Autopackage also has no concept of virtual packages or "provides." Thus, a package cannot depend on a generic capability such as in most distros for things like MTA, or MUA.

Further Reading

klik

klik is a system for running applications inside a compressed filesystem image, thus avoiding installation altogether. Once the klik support is enabled, a user can just click on a klik URL in their browser to download and run an application on their computer. Root access is not required and it never touches the local system. This makes application removal as easy as deleting a file.

Basic Architecture

klik "packages" are really cramfs images that are loopback mounted. These images are compressed images with a standard entry-point for executing the program. When the program exits, the loopback filesystem is unmounted. These packages are created by following a recipe that takes Debian sarge packages as input and processes them into klik packages. This recipe is generated by a web-service on the klik web page, but run on the client, and the client does the transformation. It is possible, to redistribute the klik package independent of the web site.

Community

klik is developed actively by four developers, mostly in Europe. There are approximately 4,000 packages available for download (most klik recipes are based on Debian Sarge's 17,500 packages, and only some of which need to be hand-tuned afterwards, so the number of packages is not an indicator for effort spent). There are packages for some proprietary software. While all of the software distributed is either free to use or a free trial, most have restrictions of redistribution, indicating at least some degree of support from the ISV (in fact klik downloads from the ISVs own servers and hence does not "redistribute"). Many of the popular free-as-in-beer applications for Linux are available as klik packages such as Acroread, Opera, Picassa, Skype, and Realplayer.

Roadmap

The klik developers indicate that they are interested in moving the project in much of the same way that it is going now, without unveiling large differences in functionality. They are interested in moving to FUSE instead of loopback mounting, adding cryptographic signatures, and having a more robust ABI compatible base. They would like to see more cooperation with the distros in having their infrastructure built in to distros when they are shipped.

Due to frequent requests by users, they are brainstorming about an automatic updater at the moment.

Benefits/Deficiencies

The klik developers have stuck to a core concept of the click and run image that does not interfere with the system. This has several benefits. Users can try out software without worrying about polluting their system or needing root privileges. Software can be run directly from USB stick (thumbdrive) or even CD or DVD.

klik currently is designed from the end user's perspective, as for most desktop/notebook installations there is no system admistrator (or if there is one, s/he doesn't necessarily install every single piece of software that the end user might want). Hence, one of klik's design goals is to give the end user the possibility to download and run their own private applications. On a shared machine, this means that the benefits of shared resources are lost and the memory requirements on such a machine would be much larger. (However, the system administrator can place klik image files to /opt/klik for example and hence porovide them to the entire system.)

In the current system of using loopback devices, there is a limitation of having only eight loopback devices available. However, there is a kernel parameter to increase that number. Even when klik switches to FUSE, they pay the penalty of speed for this isolation. The theoretical slowdown cannot be seen in practice, however: Consider very large applications, such as OpenOffice.org (one of the largest applications used on the desktop today). Using klik, it doesn't launch or run noticeably slower than otherwise (quite in contrast, since the klik image is compressed and hence less data needs to be read from the disk, it might actually launch faster).

Further Reading

loki

The loki installer is a set of utilities and components developed by the now defunct Linux game company, Loki. While Loki was in business, it was used to install many Linux games on a multitude of distros.

Basic Architecture

The system is more of an installer than a package manager, but does include many of the facilities of a package manger. The installer has an XML configuration that works much like a spec file in RPM. It has a system not unlike debconf for asking the user configuration questions prior to installation. Additionally, it has a EULA display and acceptance function that is separate from configuration. The installer is fairly flexible, as it has a system for plugins to add additional functionality. The installer has a database that it uses for keeping track of installed/upgraded/removed. The loki tools also include a tool for generating and installing binary patches for deployed software. This is something that is mostly absent in other packaging systems because it is assumed that the packages themselves are fairly small. Game packages can easily have gigabytes of data in an installed package.

Community

While Loki no longer exists, the development resides at icculus.org, where it is being actively developed. There are a number of commercial games are still using it as well as Google, nVidia, Codeweavers, and others. Most of the development seems to be centered around integrating native MacOSX support. The development hosting is very basic with only a mailing list and CVS access. There is no formal bug tracking system or home page.

Roadmap

There are no regular releases of the tools, but there are features planned in the future such as xdg-menu support. For the most part, it appears that the development is in maintenance mode, with no large changes to architecture planned.

Benefits/Deficiencies

The loki tools do what they are designed to do, which is deploy a binary application onto a user's system. Since these applications typically do not involve manipulating the system much, it usually works. The facilities for update, configuration, and binary patching are really quite good. There is no provision for interaction with the distro, however. For some applications, this is not really a problem because they don't need any interaction with distros. However, certain types of software must interact with the distro in an intimate way (such as an authentication or manageability solution). With such a system, they are essentially on their own. Because it is really an installer, there is no commonality that is shared between applications that use it. This precludes such features as cryptographic signing, as each application ships it's own engine for installation. Thus a compromised binary could also compromise the installer.

Zero Install

Zero Install is a decentralised installation system (there is no central repository; all packages are identified by URLs), loosly-coupled (if different programs require different versions of a library then both versions are installed in parallel, without conflicts), and has an emphasis on security (all package descriptions are GPG-signed, and contain cryptographic hashes of the contents of each version). Each version of each program is stored in its own sub-directory within the Zero Install cache (nothing is installed to directories outside of the cache, such as /usr/bin) and no code from the package is run during install or uninstall. The system can automatically check for updates when software is run.

Basic Architecture

Being decentralised, programs and libraries are named using URIs rather than simple names, and the packing system uses these URIs. The user does not explicitly ask for software to be installed; rather, they ask for the program to be run. The system will prompt them to download the program first if required, and it can also check for updates on a regular basis.

To find an implementation (binary) of a package, the system downloads information about available versions from a number of feed files. By default, each package has a single feed whose URL is the URI of the package. Hence, to run the program http://site/someprog, the system will default to downloading a single feed from http://site/someprog.

Each feed is a digitially signed XML file listing available versions of the package and any dependencies they have. The dependencies are also given as URIs. Each user has a list of trusted GPG keys (initially empty) and will be prompted to confirm new keys before the contents of a feed will be used. To aid the user in their decision, a built-in database of known keys is provided.

Having collected a list of available versions, the system runs a simple constraint solver to pick a version of the program and compatible versions of any libraries. The user may change this set if required (for example, by marking a particular version as 'buggy' so that another will be chosen instead). The system then downloads the chosen versions, if required, unpacks them into the Zero Install cache, and runs the program.

Each unpacked archive has its own directory, so file conflicts are impossible. Environment variables are used to allow each process find the chosen libraries. In particular, this means that different programs can use different versions of a single library, which is a key feature of a decentralised system since it avoids the need to synchronise updates of packages. As with Autopackage, Zero Install binaries must be relocatable.

User interface

Users typically install by dragging a link (whose target is the URI of the program) from a web page to their launcher. For example, a user of the Xfce desktop would drag to the panel's launcher configuration dialog, while a user of ROX would drag to the AddApp application. A shell user would paste the URI into the 0alias command to create a shell alias.

Installation is not started by clicking, as in some systems as this is considered to be a security risk, since arbitrary web-pages could trigger the installation process at any time. Although this should be safe (since confirmation of GPG keys and versions is required) it is safer not to start the process at all without explicit action from the user. Click-based systems will likely need to adopt Firefox's XPI count-down timer and similar tricks if they become popular.

Community

Zero Install is actively developed by a single core developer. The software is released under the LGPL. A number of people provide XML feeds for their own software and for other programs (e.g. Firefox). See Further Reading for a list. Zero Install is widely used within the ROX desktop community.

There are around 160 packages currently distributed this way, though not all are actively maintained.

Roadmap

Planned features include better support for mirrors and for optional dependencies.

Benefits/Deficiencies

The principle benefits of Zero Install are that it is truely distributed (no central server as with Klik), has good security throughout, allows users to install software without root access, and provides a loose coupling between programs, avoiding version conflicts.

Interaction with RPM and Debian-based distributions is supported, allowing a native package to provide a dependency. If the native package is missing or too old, a more suitable version will be downloaded instead. Like all Zero Install packages, this will be stored in its own directory and will not conflict with the distribution's version of the library or affect the behaviour of other distribution-provided packages.

Because the XML feed files simply give the location of existing binary packages, there is no special format for these. Zero Install can unpack tarballs, zip archives, RPMs, Debs and even autopackages, although it does not, of course, not run any pre- or post-install scripts when doing this.

Further Reading


Requirements

From the previous discussion of stakeholder needs as well as offerings of the current solutions, we should be able to infer requirements for our solution.

Unified/Simple UI

Any solution must provide a consistent, and easy to use user interface across multiple distros and desktop environments. It is not to say that one desktop environment must be picked over another, but rather that the user interaction look and perform almost identically across all variations. This provides a way for the software maker to concisely describe how their software can be installed and removed.

Update Facilities

Third party applications should have a common way to deploy security updates to their users. This provides a methodology for update that can update the application for all users, rather than just the user that is running the application at the moment, as with built-in updaters.

Distro Interaction

Any solution should work with the distro it runs on in a way that cannot break the system. This precludes the third-party packaging system from creating/removing/modifying files without some sort of proper interaction with the distro packaging system. A solution that fails to do this will produce both broken systems as well as ill will within the distro community.

Catalog

IT managers as well as users require that a packaging system provide mechanisms for querying meta-data such as dependencies, reverse dependencies, file ownership, license, etc. This enables the IT manager or user to audit license compliance, quickly find out what packages are installed, and know the implications of removing or replacing packages.

Solutions

There are essentially two directions that we can take that can fulfill both requirements: Transformation and Secondary Management.

Transformation

By transformation, we mean transforming some third-party package to the native package format of the distro. The current example of a transformation approach is alien, which can translate from one package format to another. To meet our needs, however, alien is inadequate. The crux of transformation is that the knowledge is embedded in the transformation engine. This obviates the need for reimplementing all the machinery of that the distro already provides.

Abstracting Common Features

Some aspects of this proach are straightforward. Some facilities that are needed are the same across all distros. In fact, there are several cases where facilities are specified to be equal across distros. LSB specifies utilities and commands such as install_initd, useradd, and groupadd. Projects such as xdgutils specify commands for installing menu items and icons. As needs are identified, more utilities can be made that abstract away differences between distros.

However, it will never be the case that every operation that every package needs will be abstracted in such a way. In these special cases, it will be up to the package to have the knowledge of how to proceed on different distros. The way that differences in distros must be handled boils down to a set of actions mapped to a set of distros. When an unknown distro is encountered, the package can't install. Because the handling of these cases is intrinsic to the package, we can envision scenarios in which the package must be updated simply to update this information:

  • Package foo-1.0 works on distros bar-1.0 and baz-1.0
    qux-1.0 is released and the vendor would like to support it as well. This
    requires a package update so the package can gain the additional support for
    baz-1.0
  • Package foo-1.0 works on distros bar-1.0 and baz-1.0
    baz releases version 1.1 of their distro. This requires a package update
    because foo-1.0 doesn't recognize baz-1.1

Of course, we can provide parts of the transformation engine that can provide certain facilities that are not provided by all distros but are similarly abstract for packages to use. For instance, no distro provides a good way to do EULA display and acceptance. However, the transformation engine can provide such functionality and embed it into generated packages to be called in the preinst phase.

The transformation becomes extremely difficult there a major differences between packaging formats. For example, debian has a configuration stage, while rpm-based distros do not. Alien requires that a package only use a restricted subset of features available to ensure that transformation is possible. If configuration, for example, was taken to be a required step in the package installation process, then the transformation engine would need to construct rpm's with a combination of triggers, pre-inst, and post-inst methods for doing this. This can become non-trivial problem easily.

Looking at the Requirements

It is helpful to look at the requirements of a 3rd-party packaging solution and see these influence the design of such an approach.

Unified/Simple UI

This requirement can be met by providing a graphic utility that handles transformation and installation on the target system. A front-end/back-end approach allows for the most shared-code between distros.

Unfortunately, we can't escape replicating some functionality of distro packaging systems in order to meet the needs of update/installation/removal. A separate dependency resolver must be implemented for these third-party packages, to avoid trying to install installable combinations of packages.

Additionally, the distro-specific parts of the code-base become fairly large, as the handling of the individual low-level package interfaces can be quite specific. For example, dpkg can ask the user questions interactively. The utility must have hooks for this and be able to deal with these questions. Differences in how packages are installed and rolled-back in case of error or interruption require distro-specific wrappers for the low-level package tools, even when they are all rpm.

Update Facilities

In a typical distro, there is some update notification service that runs and regularly checks certain repositories for updated packages. Each distro has a different methodology for keeping repositories and their associated meta-data. There are two possible directions at this point.

In one scenario, we choose to use the distro's built-in update functionality. The transformer would need to be able to transform some common form of repository into a distro-specific repository on the local machine. When packages are updated in the common form repository, the transformer would create updated native packages in the local repository. The distro's update facility will then be aware of updated packages and prompt the user to install them or, if configured to do so, install them automatically. This situation breaks the idea of a common UI for install/remove/upgrade, though. Each distro presents their update functionality slightly differently.

In the other situation, we implement an update notification service that alerts the user when updated 3rd-party packages are available. When the user chooses to install this updated software, the previously mentioned utility is invoked to perform the action. This choice comes at the expense of usability, though. The user will be presented with two update notifications: one for the distro and one for third-party packages. This is hardly ideal.

Distro Interaction

With transformed packages, it is the distro's tools that actually do all the manipulating of the system. In cases where there is a clean separability between 3rd-party packages and distro provided packages as per FHS guidelines, this is a simple problem. However, there are cases where this separation breaks down. This is especially apparent for configuration files, because they may need to be dynamically edited by the package. On some distros, it is possible to encode that one package has changed the configuration file that originated from another package (thus both packages "have" that config file). On most distros, this functionality does not exist. What is to be done when third-party package foo needs to edit distro package bar's /etc/bar.conf in order for foo to function? If the distro is unaware of this, when bar is upgraded, the foo may break. Without native support for such things, we must rely on facilities such as triggers to handle this case. Programmatically generating these triggers is non-trivial. Such cases make the transformation engine very difficult to implement.

Catalog

This requirement is inherently met because the 3rd-party packages are transformed into the native package format. Thus, the native package manager's database is used and can be queried for packages that are installed. Unless we transform repositories as mentioned above, we cannot rely on the distro catalog to do repository queries: "find all packages named *foo*", "describe implications for installing foo", etc.

Architecture

We obviously want to move towards a design with the maximum amount of shared code across distros. Thus, we can have backends that create packages and have appropriate interfaces for using the facilities of that particular distro. We have common facilities with their own set of interfaces, and we have the distro-neutral package format engine's interfaces. To map from one to the other, we have a broker that can manipulate as needed.

For example, we have a backend that can produce rpm's for FC5. It should be noted that these rpm's can only be used on FC5 and must be generated so that they depend on something that is only present in FC5. Our transformation engine provides a utility that can Display a EULA and not allow the user to install if the user does not accept. The broker matches the need for a facility to display EULA (as FC5 lacks one) with the transformation engine's built-in facility for doing so. So, the transformed packages will depend on the transformation package itself. In the generated RPM, we can use the preinst phase for displaying the EULA. Note that this system can be bypassed by using --noscripts option to rpm. This example is simpler than most, because we are not needing anything that relates to the phases of installation.

If instead, wanted to do configuration as part of the package, we need to map a configuration facility to post-install and triggers in order to provide this functionality. In order to track what files have been generated by such stages, we need to provide a database facility that can be queried to find such files and return them to their natural state. In these post-inst and trigger invoked calls, we need to provide a facility for merging and un-merging files into the system as well as providing the appropriate cleanup should the instalation fail.

In effect, we are recreating the distro-machinery from the inside out. Because of the limited flexibility and variability of the underlying distro, we almost cannot avoid making duplicate machinery that works out of band.

Secondary Management

The antithesis of the transformation approach is the secondary manager approach. In this approach, we become freed from the restrictions of the native package management phases and structure. Thus, we can construct a manager that has the most commonality across distros and has the most flexibility. We only to insert hooks into the native package manager in order to initiate actions where there is a mutual interest.

For example, if we need to edit /etc/alternatives/sendmail to install a third-party MTA, we insert a hook into the native package manager's database that notifies us if, for example, the user were to install another MTA from the distro.

Looking at the Requirements

Once again, we look at each requirement and examine how it relates to this design direction.

Unified/Simple UI

This requirement can also be fulfilled with an easy to use graphical front end to the secondary package manager. A front-end/back-end system that is similarly employed in almost all distros is a good model to adopt.

Update Facilities

Since we have control over the packaging process and the package manager, we also have control over update notification. Once again, we can borrow from the distro design pattern of having an application with a desktop-notification area icon that periodically checks for updated versions of installed packages from the configured list of repositories.

Distro Interaction

They key part of having a working secondary is to have hooks that can invoke action from the primary package manager. In rpm, this can be accomplished with triggers. When a third party package has a mutual interest in a file or state of a package, then the secondary package manager inserts a trigger into the rpm database that invokes some functionality in the secondary manager.

For example, we have a third-party package that installs a new shell, foosh. When this package is installed, it updates /etc/shells. /etc/shells belongs to aaa_base (on suse). So, the secondary package manager inserts a trigger on aaa_base's state changes. If aaa_base were to be upgraded, the secondary manager would be called after the fact to update /etc/shells appropriately. If this were not to happen, then /etc/shells might be overwritten in the upgrade process, resulting in users with foosh as their login shell not being able to login.

Sometimes we may have a mutual interest in a file where the are several native packages that may also have an interest. If we were to have a third part package that was a new MTA, it would have an interest in /usr/sbin/sendmail. On SuSE Linux systems, multiple packages provide the sendmail binary (sendmail, exim, and postfix), and thus conflict. Similarly, our new MTA must conflict with these packages.

Catalog

Once again we can borrow proven techniques for making our own meta-data database for our own information. We, of course, will have the additional burden of storing meta-data that is pertinent to our interaction with the underlying distro's packaging system, but this represents a small set of data.

Architecture

The architecture of such an aproach is similar to the one employed in most package mangers, with a high-level and low-level set of tools. The high-level tool manipulates repostitories (query, add, refresh) and their meta-data as well as having the dependency solver in it. The low-level tool manipulates packages and their meta-data at an individual level.

While the basic aproach of the high-level tool is well understood, implimenting it represents many minutae of design desicisions such as repository mechanics. This is one area where distros and ISV's may not see eye to eye. Some ISV's may not want universal access to their software, and thus may require credentials for access to their repository. This access control may be limitted in the number of connections that is permissible. Thus, a design that allows for highly flexible repositories must be crafted. Such repositories could even be offline, with out of band update communication. Such a design should provide not only a convenient implimentation for conventional repositories and related tooling (proxy, mirror, etc), but also an API set for the programmatic emulation of a repository for ISV's to use as they see fit.

The low-level tool also presents a twist in design. The basic aproach is straight forward, and also has many design decisions that must be made with regards to package format, meta-data information and format, cryptographic signing facilities, etc. However, there are two parts that are unconventional.

The first is the interraction with the host package manager. This represents either providing a wrapper for the primary package management tools (the read-only aproach) or a way to make dummy packages in the primary database that invoke activity in the secondary package mangement tools. Each distro has different tooling and capabilities, so wrapping the tooling or manipulating the package database both require a distro abstraction layer that must be created. This layer must be able to transform generic facilities into specifics for each packaging system. For example, on RPM a virtual package with a trigger may be an appropriate hook for invoking the secondary. However on Debian, it may be required to have a generic 3rd-party dummy package with many virtual packages provided by it. Installing a new 3rd-party package would require the generic 3rd-party dummy package to be updated with new provides information.

The second twist, is providing some facilities that are not generally in the low-level package tool. This may be as simple as a mechanism for packages to call that does EULA display or as complicated as configuration storage and merging. A generic, extensible framework of plugins is required that is generic enough to be flexible and specific enough to provide a reasonable degree of simplicity for the caller is required. Constructing such a framework will be somewhat of a challenge, but there are tool sets in distros or in development that can be looked to for guidance.

Despite these two twists, the basic architecture is well demonstrated in existing managers, and should be straightforward to implement.

Comparison

Basic Construction

While in the abstract, it looks as though some of the work involved in each approach is the same. In order to do both transformation and a secondary manager, we need to implement the following things:

  • dependency solver
  • GUI front end
  • repository mechanics
  • format facilities (pack, unpack, parse, etc)
  • cryptographic signing and web of trust mechanics
  • catalog

For transformation, we need the following additional items:

  • transformation mappings
  • per-distro "bridges" -- mechanisms to implement facilities the

distro lacks

For the secondary manager we need the following additional items:

  • merge/unmerge mechanics
  • primary packaging system entry-point hooks and manipulations

In the case of the secondary manager, the interfaces between all of the components is for the most part, a straight forward design problem. All of the various components can borrow from techniques that have been proven in the past and we can cherry-pick the best methodologies for our implementation. Alternatively, if we can find an existing implementation that is flexible enough, we can use it for the secondary and simply work on the glue to the primary. The only component that is difficult is the entry-points and hooks into the primary package manager. If the hooks are made one-way (IE, the secondary can only observe actions of the primary), then the task is very manageable. If the hooks are made to be two-way, (the secondary can observe and command the primary), then there exist the possibility of creating cycles, which must be carefully avoided.

In the case of the transformation approach, almost all of the components must be interfaced in a way that is not prevalent in current practices. It presents a unique challenge to craft interfaces between the different components, not to mention that some components present non-trivial challenges, especially the mapping engine. It seems difficult to imagine such a system as being easily extensible or maintainable. Thus it is seen as somewhat inelegant.

Non-Technical Issues

There are some non-technical issues that are worth noting. The current approach of a very restricted transformation is palatable to distros because it involves little work and doesn't provide a fundamentally different answer for how packages are manipulated.

However, it should not be automatically assumed that a more robust version of transformation would be likewise palatable. Firstly, the transformed package may not be as benign as an alien'd package would be because of the mapping of various facilities not present in the distro. This may include writing files that are not part of the package's manifest or writing to files that are present in other packages.

Because the transformation mapping is difficult and has many details that are distro specific, we may become reliant on the distros to provide such information for the mapping. Getting distros willing to do so may not be easy. Additionally, once we have a transformation engine present on the distro, we run into the possibility of being captive to the distro's release cycle or wants. If one distro decides that they are unhappy with feature-x, they effectively control the format. Thus all features, fixes, and capabilities may be subject to any distro's veto.

On the other hand, the secondary package management system presents a situation that the distros may find very unpalatable. With this approach, we can limit our intrusion on the primary package manager's operation by encapsulating the primary package management tools in a simple wrapper that lets us observe and pause operation. However, it is much more desirable to be able to enter nodes into the primary package manager's database for a more cohesive solution. This can be done by either having the distros provide tools to do so, or providing our own tools. The distros may not particularly be happy with either. However, it is worth noting that distros are under increasing pressure to provide some flexibility in their package management system to allow for the enforcement/activation of custom security policies. This same flexibility can be leveraged by a secondary manager as an entry-hook in a way that is palatable to the distro.

Conclusions

From a purely technical standpoint, it seems that the secondary manager approach is the more elegant solution. The non-technical issues with the secondary manager, are at least in the opinion of this author, likely to be manageable.

Next Steps

While there is considerable uncertainty in the nature of implementations, there is less ambiguity about how to proceed with finding a palatable solution and interested parties.

A discussion with the distros will be required in order to get the required buy-in and clarify what interfaces to packaging systems can be relied upon to be stable and possibly standardize. For rpm based distros this may involve some uncertainty, because of the current situation of rpm maintainership. It would be beneficial if there could be some agreement on an upstream home for RPM so that common development could proceed.

For debian based distros, there is a substantial opportunity in the near term. In the post Etch release timeframe, work will begin on defining dpkg2, which will be a substantial departure from dpkg. There are essentially two parts to this opportunity. The first is that because a new architecture is forming, we will potentially have the ability to recommend interfaces to facilitate the manipulation of the database in such a way as to make it easier for the secondary to function. The second opportunity is that with a sufficiently flexible architecture, the secondary manager can use most of the same code as the primary, essentially leveraging the development. The preliminary descriptions of dpkg2 spell out an extremely flexible architecture that can likely be used to create a secondary manager.

Outside of the mechanics of package management and facilitating discussions, it is still desirable to create common interfaces present on all distros to do common things. LSB addresses some of this with utilities like useradd, install_initd, etc. Even where interfaces are defined, the implementation may be too simplistic for serious use. Such examples are useradd in situations with authentication over ldap. There are several other facilities which need specifying and implementing, such as setting MANPATH, INFOPATH, PKGCONFIG_PATH, perl's @INC, etc. Some of these are simple such as setting PATH variables, but some are not. Tasks like manipulating PAM modules can be non-trivial to make a generic interface to.

With this in mind, it is highly desirable to start a discussion either shortly before or immediately after the release of Etch. This discussion should include more than just the dpkg community but also interested parties from the various RPM distros as well as other fellow travelers such as LSB. This can likely be grafted on to some existing forum such as the Desktop Architects Meeting or LinuxConf Australia.

An interesting perspective is the package management of GoboLinux (http://www.gobolinux.org/)- not necessarily their file system but their philosophy on packaging. Could aspects possibly be borrowed from Compile to standardize packaging? If so not only could distro's utilize their own package manager, but these standardized packages should also be installable in and of themselves if Compile could be integrated with the respective package managers.


[Article] [Discussion] [View source] [History]