When you type docker.io run -it debian sh, it goes off and gets "debian" and runs it. But what is in this "debian" image? How was it built?

The docker hub does not really say. All it tells us is this is a "(Semi) Official Debian base image" and that its sources.list uses http.debian.net for geolocation.

There's a link to https://github.com/dotcloud/stackbrew/blob/master/library/debian which in turn uses a very strange git repository, owned by Debian maintainer Tianon Gravi, that contains compressed tarballs of Debian: http://github.com/tianon/docker-brew-debian "Git is not a fan of what we're doing here."

The "source", such as it is, that is used to build this image consists of:

FROM scratch
ADD rootfs.tar.xz /
CMD ["/bin/bash"]

and

mkimage.sh -t tianon/debian:wheezy -d . debootstrap --variant=minbase --components=main --include=inetutils-ping,iproute wheezy http://http.debian.net/debian

I don't know where mkimage.sh is. [Update: Probably /usr/share/docker.io/contrib/mkimage-debootstrap.sh or a modified version] And anyway, I have no reason to trust that this image is built the way it claims to be built. So, the question remains: What is in this image?

To find out, I did a debootstap --variant=minbase stable and diffed the entire docker debian image against it. The diff was 6738 lines, from which I found the following interesting differences.

added packages

The image has iputils-ping and netbase and iproute added. These are not in a minbase debootstrap, but are in a regular debootstrap. It's rather weird that the docker image is based on a minbase debootstrap, since this means they have to add back important stuff like this on an ad-hoc basis.

If the expectation is that an experienced Unix person who found it missing would say "What on earth is going on, where is 'foo'?", it must be an 'important' package. -- Debian Policy

apt hooks

DPkg::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };
APT::Update::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };

Dir::Cache::pkgcache "";
Dir::Cache::srcpkgcache "";

Acquire::Languages "none";

These are some strange modifications to apt's config. The intent is clearly to avoid wasting disk space, even at the expense of making apt slower (by disabling caches) and losing translations.

I am curious if apt might ever invoke the DPkg::Post-Invoke twice in an upgrade in which it runs dpkg twice. I'm also curious whether deleting /var/cache/apt/archives/lock could cause a problem.

unsafe-io

dpkg is configured to use unsafe-io.

motd

Linux viper 3.12.20-gentoo #1 SMP Sun May 18 12:36:24 MDT 2014 x86_64

Yes, that's "gentoo". Presumably this tells us something about the build host.

policy-rc.d

/usr/sbin/policy-rc.d contains "exit 101", which prevents daemons from being automatically started after they are installed. This may or may not be desirable, depending on what you're doing with docker.

It notably also prevents restarting running daemons in this container if they're upgraded for eg, a security fix. It would almost certianly be better if this script allowed restarting running daemons.

diversions

/sbin/initctl is diverted and replaced with /bin/true. This is a workaround for a bug in sysvinit; when upgraded inside a docker container it hangs while trying to run initctl.

missing devices

Some versions of the debian image are missing things in /dev. See this bug.

(I had listed some device files that I thought were missing, but I was wrong.)

some gpg thing is different

Binary files pure-debootstrap/etc/apt/trustdb.gpg and from-docker/etc/apt/trustdb.gpg differ

Oh well, that can't be important.. Or can it? I did not check.

conclusions

I would hardly consider this to be an "(Semi) Official Debian image". Some of the changes are quite dubious. The build environment is not Debian. There is no guarantee you'll get the same image I examined. Diffing thousands of lines of filesystem changes is not particularly fun or reliable way to spot accidental or malicious changes.

I'd recommend only trusting docker images you build yourself. I have some docker images published somewhere that are built with 100% straight debootstrap with no modifications (and even an armel image that can be used on an x86 system thanks to qemu). But I'm not going to link to them, because again, you should only trust docker images you built yourself. To help increase your mistrust of me, I present this IRC snippet:

<joeyh> I'll bet I could publish an image that just did a killall5 as root on startup and get plenty of people to nuke their container hosts

Here are some ideas for things Debian could do to improve this:

  • Make a package that can build docker images of Debian, in a fully reproducible fashion. Ie, same versions of debs in, same byte stream out.
  • If it makes sense for the docker image to not contain all the packages in a standard debootstrap (maybe leaving out init systems), or to contain other packages, write down the rationalle for this, and make a --variant=docker.
  • Make a package that provides appropriate tweaks for Debian in a container. This might include a policy-rc.d that allows restarting daemons on upgrade if they're already running in the container, and otherwise prevents running daemons.
  • Make a low-disk-space package that eg, prevents apt from caching debs.
  • Provide some way to verify, through gpg signatures, that docker has pulled an actual trusted image and not some https-MITMed thing. (See also #746394

PS, if this wasn't enough fun, just consider the tweaks made to the "Debian" images on all the VPS hosts out there.