using a debian package as the remote for a local config repo

Today I did something interesting with the Debian packaging for propellor, which seems like it could be a useful technique for other Debian packages as well.

Propellor is configured by a directory, which is maintained as a local git repository. In propellor's case, it's ~/.propellor/. This contains a lot of haskell files, in fact the entire source code of propellor! That's really unusual, but I think this can be generalized to any package whose configuration is maintained in its own git repository on the user's system. For now on, I'll refer to this as the config repo.

The config repo is set up the first time a user runs propellor. But, until now, I didn't provide an easy way to update the config repo when the propellor package was updated. Nothing would break, but the old version would be used until the user updated it themselves somehow (probably by pulling from a git repository over the network, bypassing apt's signature validation).

So, what I wanted was a way to update the config repo, merging in any changes from the new version of the Debian package, while preserving the user's local modifications. Ideally, the user could just run git merge upstream/master, where the upstream repo was included in the Debian package.

But, that can't work! The Debian package can't reasonably include the full git repository of propellor with all its history. So, any git repository included in the Debian binary package would need to be a synthetic one, that only contains probably one commit that is not connected to anything else. Which means that if the config repo was cloned from that repo in version 1.0, then when version 1.1 came around, git would see no common parent when merging 1.1 into the config repo, and the merge would fail horribly.

To solve this, let's assume that the config repo's master branch has a parent commit that can be identified, somehow, as coming from a past version of the Debian package. It doesn't matter which version, although the last one merged with will be best. (The easy way to do this is to set refs/heads/upstream/master to point to it when creating the config repo.)

Once we have that parent commit, we have three things:

  1. The current content of the config repo.
  2. The content from some old version of the Debian package.
  3. The new content of the Debian package.

Now git can be used to merge #3 onto #2, with -Xtheirs, so the result is a git commit with parents of #3 and #2, and content of #3. (This can be done using a temporary clone of the config repo to avoid touching its contents.)

Such a git commit can be merged into the config repo, without any conflicts other than those the user might have caused with their own edits.

So, propellor will tell the user when updates are available, and they can simply run git merge upstream/master to get them. The resulting history looks like this:

* Merge remote-tracking branch 'upstream/master'
|\  
| * merging upstream version
| |\  
| | * upstream version
* | user change
|/  
* upstream version

So, generalizing this, if a package has a lot of config files, and creates a git repository containing them when the user uses it (or automatically when it's installed), this method can be used to provide an easily mergable branch that tracks the files as distributed with the package.

It would perhaps not be hard to get from here to a full git-backed version of ucf. Note that the Debian binary package doesn't have to ship a git repisitory, it can just as easily ship the current version of the config files somewhere in /usr, and check them into a new empty repository as part of the generation of the upstream/master branch.

Posted
abram's 2014

pics from trip to Abram's Falls

The trail to Abram's Falls seems more trecherous as we get older, but the sights and magic of the place are unchanged in our first visit in years.

Posted
laptop death

So I was at Ocracoke island, camping with family, and I brought my laptop along as I've done probably half a dozen times before. An enormous thuderstorm came up. It rained for 8 hours and thundered for 3 of those. Some lightning cracks quite close by as we crouched in the food tent, our feet up off the increasingly wet ground "just in case". The campground flooded. Luckily we were camped in the dunes and tents mostly avoided being flooded with 2-3 inches of water. (That was just the warmup; a hurricane hit a week after we left.)

My laptop was in my tent when this started, and I got soaked to the skin just running over there and throwing it up on the thermarest to keep it out of any flooding and away from any drips. It seemed ok, so best not to try to move it to the car in that downpour.

Next time I checked, it turned out the top vent of the tent was slightly open and dripping. The laptop bag was damp. But inside it seemed ok. Rain had slackened to just heavy, so I ran it down to the car. Laptop appeared barely damp, but it was hard to tell as I had quite forgotten what "dry" was. Turned it on for 10 seconds to check the time. It was 7:30 and we still had to cook dinner in this mess. Transferred it to a dry bag.

(By the way, in some situations, discovering you have a single dry towel you didn't know you had is the best gift in the world!)

Next morning, the laptop was dead. When powered on, the fan came on full, the screen stayed black, and after a few seconds it turned itself back off.

I need this for work, so it was a crash priority to get it fixed or a replacement. Before I even got home, I had logged onto Lenovo's website to check warantee status and found 2 things:

  1. They needed some number from a sticker on the bottom of my laptop. Which was no longer there.
  2. The process required some stange login on an entirely different IBM website.

At this point, I had a premonition of how the beuracracy would go. Reading Sesse's Blehnovo, I see I was right. I didn't even try. I ordered a replacement with priority shipping.

When I got home, I pulled the laptop apart to try to debug it. I still don't know what's wrong with it. The SSD may be damaged; it seems to cause anything I put it into to fail to work.

New laptop arrived in 2 days. Since this model is now a year old, it was a few hundred dollars cheaper this time around. And now I have an extra power supply, and a replacment keyboard, and a replacement fan etc. And I've escaped the dead USB port and broken rocker switch of the old laptop too.

The only weird thing is that, while my old laptop had no problem with my Toshiba passport USB drive, this new one refuses to recognize it unless I plug it into a USB 1.0 hub. Oh well..


Update: Ocracode for this trip:

OBX1.1 P6 L7 SA3d+++b+c++ U2(rearended,laptop death) T4f2b1 R2T Bb+m++++n++
F+++u++ SC+s-g6 H+f0i3 V+++s++m0 E++r+

Posted
what does docker.io run -it debian sh run?

When you type docker.io run -it debian sh, it goes off and gets "debian" and runs it. But what is in this "debian" image? How was it built?

The docker hub does not really say. All it tells us is this is a "(Semi) Official Debian base image" and that its sources.list uses http.debian.net for geolocation.

There's a link to https://github.com/dotcloud/stackbrew/blob/master/library/debian which in turn uses a very strange git repository, owned by Debian maintainer Tianon Gravi, that contains compressed tarballs of Debian: http://github.com/tianon/docker-brew-debian "Git is not a fan of what we're doing here."

The "source", such as it is, that is used to build this image consists of:

FROM scratch
ADD rootfs.tar.xz /
CMD ["/bin/bash"]

and

mkimage.sh -t tianon/debian:wheezy -d . debootstrap --variant=minbase --components=main --include=inetutils-ping,iproute wheezy http://http.debian.net/debian

I don't know where mkimage.sh is. [Update: Probably /usr/share/docker.io/contrib/mkimage-debootstrap.sh or a modified version] And anyway, I have no reason to trust that this image is built the way it claims to be built. So, the question remains: What is in this image?

To find out, I did a debootstap --variant=minbase stable and diffed the entire docker debian image against it. The diff was 6738 lines, from which I found the following interesting differences.

added packages

The image has iputils-ping and netbase and iproute added. These are not in a minbase debootstrap, but are in a regular debootstrap. It's rather weird that the docker image is based on a minbase debootstrap, since this means they have to add back important stuff like this on an ad-hoc basis.

If the expectation is that an experienced Unix person who found it missing would say "What on earth is going on, where is 'foo'?", it must be an 'important' package. -- Debian Policy

apt hooks

DPkg::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };
APT::Update::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };

Dir::Cache::pkgcache "";
Dir::Cache::srcpkgcache "";

Acquire::Languages "none";

These are some strange modifications to apt's config. The intent is clearly to avoid wasting disk space, even at the expense of making apt slower (by disabling caches) and losing translations.

I am curious if apt might ever invoke the DPkg::Post-Invoke twice in an upgrade in which it runs dpkg twice. I'm also curious whether deleting /var/cache/apt/archives/lock could cause a problem.

unsafe-io

dpkg is configured to use unsafe-io.

motd

Linux viper 3.12.20-gentoo #1 SMP Sun May 18 12:36:24 MDT 2014 x86_64

Yes, that's "gentoo". Presumably this tells us something about the build host.

policy-rc.d

/usr/sbin/policy-rc.d contains "exit 101", which prevents daemons from being automatically started after they are installed. This may or may not be desirable, depending on what you're doing with docker.

It notably also prevents restarting running daemons in this container if they're upgraded for eg, a security fix. It would almost certianly be better if this script allowed restarting running daemons.

diversions

/sbin/initctl is diverted and replaced with /bin/true. This is a workaround for a bug in sysvinit; when upgraded inside a docker container it hangs while trying to run initctl.

missing devices

Some versions of the debian image are missing things in /dev. See this bug.

(I had listed some device files that I thought were missing, but I was wrong.)

some gpg thing is different

Binary files pure-debootstrap/etc/apt/trustdb.gpg and from-docker/etc/apt/trustdb.gpg differ

Oh well, that can't be important.. Or can it? I did not check.

conclusions

I would hardly consider this to be an "(Semi) Official Debian image". Some of the changes are quite dubious. The build environment is not Debian. There is no guarantee you'll get the same image I examined. Diffing thousands of lines of filesystem changes is not particularly fun or reliable way to spot accidental or malicious changes.

I'd recommend only trusting docker images you build yourself. I have some docker images published somewhere that are built with 100% straight debootstrap with no modifications (and even an armel image that can be used on an x86 system thanks to qemu). But I'm not going to link to them, because again, you should only trust docker images you built yourself. To help increase your mistrust of me, I present this IRC snippet:

<joeyh> I'll bet I could publish an image that just did a killall5 as root on startup and get plenty of people to nuke their container hosts

Here are some ideas for things Debian could do to improve this:

  • Make a package that can build docker images of Debian, in a fully reproducible fashion. Ie, same versions of debs in, same byte stream out.
  • If it makes sense for the docker image to not contain all the packages in a standard debootstrap (maybe leaving out init systems), or to contain other packages, write down the rationalle for this, and make a --variant=docker.
  • Make a package that provides appropriate tweaks for Debian in a container. This might include a policy-rc.d that allows restarting daemons on upgrade if they're already running in the container, and otherwise prevents running daemons.
  • Make a low-disk-space package that eg, prevents apt from caching debs.
  • Provide some way to verify, through gpg signatures, that docker has pulled an actual trusted image and not some https-MITMed thing. (See also #746394

PS, if this wasn't enough fun, just consider the tweaks made to the "Debian" images on all the VPS hosts out there.

how I wrote init by accident

I wrote my own init. I didn't mean to, and in the end, it took 2 lines of code. Here's how.

Propellor has the nice feature of supporting provisioning of Docker containers. Since Docker normally runs just one command inside the container, I made the command that docker runs be propellor, which runs inside the container and takes care of provisioning it according to its configuration.

For example, here's a real live configuration of a container:

        -- Exhibit: kite's 90's website.
        , standardContainer "ancient-kitenet" Stable "amd64"
                & Docker.publish "1994:80"
                & Apt.serviceInstalledRunning "apache2"
                & Git.cloned "root" "git://kitenet-net.branchable.com/" "/var/www"
                        (Just "remotes/origin/old-kitenet.net")

When propellor is run inside this container, it takes care of installing apache, and since the property states apache should be running, it also starts the daemon if necessary.

At boot, docker remembers the command that was used to start the container last time, and runs it again. This time, apache is already installed, so propellor simply starts the daemon.

This was surprising, but it was just what I wanted too! The only missing bit to make this otherwise entirely free implementation of init work properly was two lines of code:

                void $ async $ job reapzombies
  where
        reapzombies = void $ getAnyProcessStatus True False

Propellor-as-init also starts up a simple equalivilant of rsh on a named pipe (for communication between the propellor inside and outside the container), and also runs a root login shell (so the user can attach to the container and administer it). Also, running a compiled program from the host system inside a container, which might use a different distribution or architecture was an interesting challenge (solved using the method described in completely linux distribution-independent packaging). So it wasn't entirely trivial, but as far as init goes, it's probably one of the simpler implementations out there.

I know that there are various other solutions on the space of an init for Docker -- personally I'd rather the host's systemd integrated with it so I could see the status of the container's daemons in systemctl status. If that does happen, perhaps I'll eventually be able to remove 2 lines of code from propellor.

Posted
who needs whiteboards when you have strange seed pods from the jungle

git-annex-routing.jpg

Discussing git-annex routing with Vince and Fernao. Might not look like much, but we seem to be close to cracking the most interesting problem with git-annex routing. I need to translate and read Vince's thesis and build some simulations..

(Seed pod, cup, camera = fixed node; mini brick = usb drive; leaves = data.)

Posted
radio brazil

Live radio broadcast going on in the tent while I teach Mill to use ikiwiki.

Posted
the real Brazil
(click to enlarge)

"This is the real Brazil" -- Fernao

After 3 days of travel, including 22 hours driving from Brasilia to the coast of Bahia, I can't say much more. A few things not pictured above..

The two hours of car-swallowing potholes in an arrow-straight highway in a flat plain, playing chicken against oncoming trucks.

Night swim in the river in Correntina, followed by a whole grilled fish and ice cold guarana. Bliss.

Push starting the car in a small cattle-farming village when it wouldn't start.

The first sight of ocean waves and the drum circle last night.

Posted
propellor-driven DNS and backups

Took a while to get here, but Propellor 0.4.0 can deploy DNS servers and I just had it deploy mine. Including generating DNS zone files.

Configuration is dead simple, as far as DNS goes:

     & alias "ns1.example.com"
        & Dns.secondary hosts "joeyh.name"
                & Dns.primary hosts "example.com"
                        (Dns.mkSOA "ns1.example.com" 100)
                        [ (RootDomain, NS $ AbsDomain "ns1.example.com")
            , (RootDomain, NS $ AbsDomain "ns2.example.com")
                        ]

The awesome thing is that propellor fills in all the other information in the zone file by looking at the properties of the hosts it knows about.

 , host "blue.example.com"
        & ipv4 "192.168.1.1"
        & ipv6 "fe80::26fd:52ff:feea:2294"

        & alias "example.com"
        & alias "www.example.com"
        & alias "example.museum"
        & Docker.docked hosts "webserver"
            `requres` backedup "/var/www"
        
        & alias "ns2.example.com"
        & Dns.secondary hosts "example.com"

When it sees this host, Propellor adds its IP addresses to the example.com DNS zone file, for both its main hostname ("blue.example.com"), and also its relevant aliases. (The .museum alias would go into a different zone file.)

Multiple hosts can define the same alias, and then you automaticlly get round-robin DNS.

The web server part of of the blue.example.com config can be cut and pasted to another host in order to move its web server to the other host, including updating the DNS. That's really all there is to is, just cut, paste, and commit!

I'm quite happy with how that worked out. And curious if Puppet etc have anything similar.


One tricky part of this was how to ensure that the serial number automtically updates when changes are made. The way this is handled is Propellor starts with a base serial number (100 in the example above), and then it adds to it the number of commits in its git repository. The zone file is only updated when something in it besides the serial number needs to change.

The result is nice small serial numbers that don't risk overflowing the (so 90's) 32 bit limit, and will be consistent even if the configuration had Propellor setting up multiple independent master DNS servers for the same domain.


Another recent feature in Propellor is that it can use Obnam to back up a directory. With the awesome feature that if the backed up directory is empty/missing, Propellor will automcatically restore it from the backup.

Here's how the backedup property used in the example above might be implemented:

backedup :: FilePath -> Property
backedup dir = Obnam.backup dir daily
    [ "--repository=sftp://rsync.example.com/~/webserver.obnam"
    ] Obnam.OnlyClient
    `requires` Ssh.keyImported SshRsa "root"
    `requires` Ssh.knownHost hosts "rsync.example.com" "root"
    `requires` Gpg.keyImported "1B169BE1" "root"

Notice that the Ssh.knownHost makes root trust the ssh host key belonging to rsync.example.com. So Propellor needs to be told what that host key is, like so:

 , host "rsync.example.com"
        & ipv4 "192.168.1.4"
        & sshPubKey "ssh-rsa blahblahblah"

Which of course ties back into the DNS and gets this hostname set in it. But also, the ssh public key is available for this host and visible to the DNS zone file generator, and that could also be set in the DNS, in a SSHFP record. I haven't gotten around to implementing that, but hope at some point to make Propellor support DNSSEC, and then this will all combine even more nicely.


By the way, Propellor is now up to 3 thousand lines of code (not including Utility library). In 20 days, as a 10% time side project.

Posted
propellor introspection for DNS

In just released Propellor 0.3.0, I've improved improved Propellor's config file DSL significantly. Now properties can set attributes of a host, that can be looked up by its other properties, using a Reader monad.

This saves needing to repeat yourself:

hosts = [ host "orca.kitenet.net"
        & stdSourcesList Unstable
        & Hostname.sane -- uses hostname from above

And it simplifies docker setup, with no longer a need to differentiate between properties that configure docker vs properties of the container:

 -- A generic webserver in a Docker container.
    , Docker.container "webserver" "joeyh/debian-unstable"
        & Docker.publish "80:80"
        & Docker.volume "/var/www:/var/www"
        & Apt.serviceInstalledRunning "apache2"

But the really useful thing is, it allows automating DNS zone file creation, using attributes of hosts that are set and used alongside their other properties:

hosts =
    [ host "clam.kitenet.net"
        & ipv4 "10.1.1.1"

        & cname "openid.kitenet.net"
        & Docker.docked hosts "openid-provider"

        & cname "ancient.kitenet.net"
        & Docker.docked hosts "ancient-kitenet"
    , host "diatom.kitenet.net"
        & Dns.primary "kitenet.net" hosts
    ]

Notice that hosts is passed into Dns.primary, inside the definition of hosts! Tying the knot like this is a fun haskell laziness trick. :)

Now I just need to write a little function to look over the hosts and generate a zone file from their hostname, cname, and address attributes:

extractZoneFile :: Domain -> [Host] -> ZoneFile
extractZoneFile = gen . map hostAttr
  where gen = -- TODO

The eventual plan is that the cname property won't be defined as a property of the host, but of the container running inside it. Then I'll be able to cut-n-paste move docker containers between hosts, or duplicate the same container onto several hosts to deal with load, and propellor will provision them, and update the zone file appropriately.


Also, Chris Webber had suggested that Propellor be able to separate values from properties, so that eg, a web wizard could configure the values easily. I think this gets it much of the way there. All that's left to do is two easy functions:

overrideAttrsFromJSON :: Host -> JSON -> Host

exportJSONAttrs :: Host -> JSON

With these, propellor's configuration could be adjusted at run time using JSON from a file or other source. For example, here's a containerized webserver that publishes a directory from the external host, as configured by JSON that it exports:

demo :: Host
demo = Docker.container "webserver" "joeyh/debian-unstable"
    & Docker.publish "80:80"
    & dir_to_publish "/home/mywebsite" -- dummy default
    & Docker.volume (getAttr dir_to_publish ++":/var/www")
    & Apt.serviceInstalledRunning "apache2"

main = do
    json <- readJSON "my.json"
    let demo' = overrideAttrsFromJSON demo
    writeJSON "my.json" (exportJSONAttrs demo')
    defaultMain [demo']
Posted