Comparing Linux/UNIX Binary Package Formats

This is a comparison of the deb, rpm, tgz, slp, and pkg package formats, as used in the Debian, Red Hat, Slackware, and Stampede linux distributions respectively (pkg is the SVr4 package format, used in Solaris). I've had some experience with each of the package formats, both building packages, and later in my work on the Alien package conversion program.

I've tried to keep this comparison unbiased, however for the record, I'm a fan of the deb format, and a Debian developer. If you discover any bias or inaccuracy in this comparison, or any important features of a package format I have left out, please mail me so I can correct it. Several people have already done so. I'm also looking for data to fill in the places marked by `?'.

This comparison deals only with the package formats, not with the various tools (dpkg, rpm, etc.), that are used to deal with and install the packages. It also does not deal with source packages, only binary packages.

This section deals with ensuring that you know who created the package, and that you can check the package installed on your system to see if the files in it have ben modified since you installed it. Does the package format contain internal support for a GPG or PGP signature that can be used to verify who created it? yesNot yet widely used though. yes no no no Are checksums available for all the files in the package? yesmd5sums file available in control data, but not explicitly part of packaging format, some packages omit it yes no no yes Is information on the files in the package, their proper permissions, sizes, owners, groups, major and minor number (for devices), etc, available? yes yes yes yes yes Recognising that it's important sometimes to be able to peer inside packages without using their package managers, this section compares how the various packages can be processed with tools available on any linux system Why standard linux tools, not unix tools in general? It's been pointed out that eg, gzip is not at all standard on all the unix systems out there. . Is the package format able to be recognized by file? yes yes no no yes Can an experienced user, when presented with a package in this format, extract its payload using only tools that will be on any linux system? They can remember a few facts to help them deal with the format, but remembering file offsets and stuff like that is too hard. yes The admin would only have to remember that a deb is an ar archive, containing some tarballs. no rpm2cpio can do it, but it's not a standard tool, except on rpm-based systems. Some fairly short programs can do it, but none of them are something you'd want to memorize. yes yes Assuming that bunzip2 is a standard linux tool, or that the package uses gzip compression instead. You need only remember that the package starts with its payload; the metadata is tacked on the end and will be ignored. no Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes. For the datastream format, a pkgtrans program is available on systems using the pkg format, but not quite standard enough for the purposes of this question. If the package has some sort of metadata (ie, package name, description, version) contained in it, can this data be accessed by standard tools, without too much difficulty? yes no N/A no no Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes. Can a package be created using standard tools, without too much difficulty? yes Although apt currently has a bug (#222701) with debs created with ar. no yes no no Metadata is my term for the information about a package contained in the package. This includes things like the package name, description, and version number. Does the package have a name in the metadata? yes yes yes no yes Does the package have a version number in the metadata? yes yes yes no yes Is there a place in the metadata for a description of the package? yes yes yes yes There's an install/description file for this information in at least some Slackware tgz files. yes A dependency says a package needs another package to be installed for the first package to work properly. yes yes yes no yes A recommendation says a package will almost always need to have another package installed. yes no no no no A suggestion says a package may sometimes work better if another package is installed. The user can just be informed of this as a FYI. yes no no no no A conflict is a package that cannot be installed when this package is installed. One common reason is if the two packages both contain the same files. yes yes yes no yes This means that there are so called "virtual packages", such as a web browser, or a mail delivery system, and packages can say they provide those virtual packages, while other packages can depend on the virtual packages. yes yes no ?? no A package can depend on or conflict with (or recommend, etc.), a specific version of a package, or all versions > or < a given version. yes yes no ?? yes This means that a package can depend, conflict, etc on a package AND (another package OR a third package). Any boolean expression must be representable, no matter how complex. Though you might have to do some factoring. yes no An rpm may depend on a list of packages, but boolean OR is not supported. You can often get the same effect using virtual packages and provides. This isn't quite the same, since it does require more coordination between packagers, and the following relationship cannot be expressed with provides: foo (<< 1.1) | foo (>> 2.0) no no no This means a package can require that some other package - any other package - be installed that contains a given file (like /bin/sh) Some people consider file dependencies a gross misfeature.. no yes no no no The package's metadata contains basic copyright information. This is useful for automatic copyright sorting, etc. no Copyright info is included in debian packages, but not in an easily extractable format. yes no yes yes The package can be assigned to a group (ie, web browsers, libraries), which might be used to group the packages when viewing a list of available packages, etc. This makes it easier to deal with large groups of packages. yes yes no no yes The package can be assigned a priority, which says how important this package is to the system. For example, packages with high priority should be looked at carefully when you are setting up a system, but you can skip installing all the packages with low priority and still know you'll still get a functional unix system. yes no no yes no The ability to categorize files depending on what they are used for, so they can be dealt with in special ways. Are config files supported? These are files that the user will typically want to edit, so when a new version of a package is installed, the package manager should be able to know to leave them alone, or do something smart like prompt the user for what to do if they have modified the files, or at least make backups of the user's changes before overwriting them. (Maybe I need more granularity here?) yes yes no yes yes Can documentation files be specially marked? This could be useful to help a user find documentation. no yes no no yes Fields exist, but there is no standard way to use them. Ghost files are files that are not actually present in the package, but are listed as being a part of it once the package is installed. This is useful for log files. no yes no no no These are programs that are contained in the package, to be run by the package manager when the package is installed, or uninstalled, or at other times. Must these programs be scripts, or can compiled binaries be used as well? yes no ?? yes no A program to be run by the package manager before the package is installed on the system. yes yes no Supported by a version of this package format used at one time by SuSE Linux. no yes A program to be run by the package manager after the package is installed on the system. yes yes yes yes yes A program to be run by the package manager before the package is removed. yes yes no Supported by a version of this package format used at one time by SuSE Linux. no yes A program to be run by the package manager after the package is removed. yes yes yes Supported by a version of this package format used at one time by SuSE Linux. no yes A program to be run by the package manager when the state of the installed package is being verified. no yes no no no This is a whole set of programs, that are run not when this package changes state, but when another package changes state. Design and capabilities vary widely. yes yes no no no How well the package format is able to grow to meet future needs. This is of great importance. Many of the comparisons above have little value in the face of this section, because new package programs, new metadata fields, etc can all be added to a scalable package format with little difficulty. Are there no limits hard-coded into the package format, that might prevent it from expanding to meet future needs? For example, are package names or versions of unlimited size? yes yes Technically, the rpm "lead" contains hard-coded limits on the package name, but the lead is no longer really used by anything except file. yes no no Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes. Can new information (text, binary data, whatever) be added to the metadata easily, without changing the package format? yes yes To be useful, you need to get a tag number assigned to your new piece of metadata, which implies modifying the rpm program. N/A no no Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes. Can the whole new sections be added to the packages, without changing the package format? For example, could the package format be expanded to have a pgp signature attached at the end, or to have a second set of data files, compiled for a different architecture or with different optimizations, attached the end? This is the ultimate test of how flexible the format is, I'm basically asking, was it designed to cope with unforeseen new requirements? yes no no no no Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes. Is there some way to look at a package and tell which version of the package format it is using? In extreme cases, this means, the whole package format can be thrown out and redesigned but old tools will still be able to read enough of the packages to know they can't deal with them. yes yes no yes no Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes. relocatable packages support for arch name in metadata, arch indep packages multiple version of the same package can be installed simultaneously (is this really a package format issue?) info available to package programs -- The programs may find various information useful to make decisions while they are running. Of course, all of them can look at what's currently on the filesystem, run other programs and look at the output, etc. This lists other information that may be useful. (old package version, etc) Copyright 1998-2003 by Joey Hess under the terms of the GNU GPL, either version 2 or at your option, any later version.