Joey blogs about his work here on a semi-daily basis. For lower post frequency and wider-interest topics, see the main blog.

debug me client-server working

Got debug-me fully working over the network today. It's allllive!

Hardest thing today was when a developer connects and the server needs to send them the backlog of the session before they start seeing current activity. Potentially full of races. My implementation avoids race conditions, but might cause other connected developers to see a stall in activity at that point. A stall-free version is certianly doable, but this is good enough for now.

There are quite a few bugs to fix. Including a security hole in the proof chain design, that I realized it had when thinking about what happens whith multiple people are connected to a debug-me session who are all typing at once.

(There have actually been 3 security holes spotted over the past day; the one above, a lacking sanitization of session IDs, and a bug in the server that let a developer truncate logs.)

So I need to spend several mode days bugfixing, and also make it only allow connections signed by trusted gpg keys. Still, an initial release does not seem far off now.

Posted
debug me websockets

Worked today on making debug-me run as a client/server, communicating using websockets.

I decided to use the "binary" library to get an efficient serialization of debug-me's messages to send over the websockets, rather than using JSON. A typicaly JSON message was 341 bytes, and this only uses 165 bytes, which is fairly close to the actual data size of ~129 bytes. I may later use protocol buffers to make it less of a haskell-specific wire format.

Currently, the client and server basically work; the client can negotiate a protocol version with the server and send messages to it, which the server logs.


Also, added two additional modes to debug-me. debug-me --download url will download a debug-me log file. If that session is still running, it keeps downloading until it's gotten the whole session. debug-me --watch url connects to a debug-me session, and displays it in non-interactive mode. These were really easy to implement, reusing existing code.

Posted
debug me signatures

Added signatures to the debug-me protocol today. All messages are signed using a ed25519 session key, and the protocol negotiates these keys.

Here's a dump of a debug-me session, including session key exchange:

{"ControlMessage":{"control":{"SessionKey":[{"b64":"it8RIgswI8IZGjjQ+/INPjGYPAcGCwN9WmGZNlMFoX0="},null]},"controlSignature":{"Ed25519Signature":{"b64":"v80m5vQbgw87o88+oApg0syUk/vg88t14nIfXzahwAqEes/mqY4WWFIbMR46WcsEKP2fwfXQEN5/nc6UOagBCQ=="}}}}
{"ActivityMessage":{"prevActivity":null,"activitySignature":{"Ed25519Signature":{"b64":"HNPk/8QF7iVtsI+hHuO1+J9CFnIgsSrqr1ITQ2eQ4VM7rRPG7i07eKKpv/iUwPP4OdloSmoHLWZeMXZNvqnCBQ=="}},"activity":{"seenData":{"v":">>> debug-me session starting\r\n"}}}}
{"ActivityMessage":{"prevActivity":{"hashValue":{"v":"63d31b25ca262d7e9fc5169d137f61ecef20fb65c23c493b1910443d7a5514e4"},"hashMethod":"SHA256"},"activitySignature":{"Ed25519Signature":{"b64":"+E0N7j9MwWgFp+LwdzNyByA5W6UELh6JFxVCU7+ByuhcerVO/SC2ZJJJMq8xqEXSc9rMNKVaAT3Z6JmidF+XAw=="}},"activity":{"seenData":{"v":"$ "}}}}
{"ControlMessage":{"control":{"SessionKey":[{"b64":"dlaIEkybI5j3M/WU97RjcAAr0XsOQQ89ffZULVR82pw="},null]},"controlSignature":{"Ed25519Signature":{"b64":"hlyf7SZ5ZyDrELuTD3ZfPCWCBcFcfG9LP7Zuy+roXwlkFAv2VtpYrFAAcnWSvhloTmYIfqo5LWakITPI0ITtAQ=="}}}}
{"ControlMessage":{"control":{"SessionKeyAccepted":[{"b64":"dlaIEkybI5j3M/WU97RjcAAr0XsOQQ89ffZULVR82pw="},null]},"controlSignature":{"Ed25519Signature":{"b64":"kJ7AdhBgoiYfsXOM//4mcMcU5sL0oyqulKQHUPFo2aYYPJnu5rKUHlfNsfQbGDDrdsl9SocZaacUpm+FoiDCCg=="}}}}
{"ActivityMessage":{"prevActivity":{"hashValue":{"v":"2250d8b902683053b3faf5bdbe3cfa27517d4ede220e4a24c8887ef42ab506e0"},"hashMethod":"SHA256"},"activitySignature":{"Ed25519Signature":{"b64":"hlF7oFhFealsf8+9R0Wj+vzfb3rBJyQjUyy7V0+n3zRLl5EY88XKQzTuhYb/li+WoH/QNjugcRLEBjfSXCKJBQ=="}},"activity":{"echoData":{"v":""},"enteredData":{"v":"l"}}}}

Ed25519 signatures add 64 bytes overhead to each message, on top of the 64 bytes for the hash pointer to the previous message. But, last night I thought of a cunning plan to remove that hash pointer from the wire protocol, while still generating a provable hash chain. Just leave it out of the serialized message, but include it in the data that's signed. debug-me will then just need to try the hashes of recent messages until it finds one for which the signature verifies, and then it will know what the hash pointer is supposed to point to, without it ever having been sent over the wire! Will implement this trick eventually.

Next though, I need to make debug-me communicate over the network.

Posted
debug me first stage complete

Solved that bug I was stuck on yesterday. I had been looking in the code for the developer side for a bug, but that side was fine; the bug was excessive backlog trimming on the user side.

Now I'm fairly happy with how debug-me's activity chains look, and the first stage of developing debug-me is complete. It still doesn't do anything more than the script command, but all the groundwork for the actual networked debug-me is done now. I only have to add signing, verification of gpg key trust, and http client-server to finish debug-me.

(Also, I made debug-me --replay debug-me.log replay the log with realistic delays, like scriptreplay or ttyplay. Only took a page of code to add that feature.)


I'm only "fairly happy" with the activity chains because there is a weird edge case.. At high latency, when typing "qwertyuiop", this happens:

debug-me.log.png

That looks weird, and is somewhat hard to follow in graph form, but it's "correct" as far as debug-me's rules for activity chains go. Due to the lag, the chain forks:

  • It sends "wer" before the "q" echos back
  • It replies to the "q" echo with tyuio" before the "w" echos back.
  • It replies to the "w" echo with "p"
  • Finally, all the delayed echos come in, and it sends a carriage return, resulting in the command being run.

I'd be happier if the forked chain explicitly merged back together, but to do that and add any provable information, the developer would have to wait for all the echos to arrive before sending the carriage return, or something like that, which would make type-ahead worse. So I think I'll leave it like this. Most of the time, latency is not so high, and so this kind of forking doesn't happen much or is much simpler to understand when it does happen.

Posted
debug me chain issues

Working on getting the debug-me proof chain to be the right shape, and be checked at all points for valididity. This graph of a session shows today'ss progress, but also a bug.

debug-me.log.png

At the top, everything is synchronous while "ls" is entered and echoed back. Then, things go asynchronous when " -la" is entered, and the expected echos (in brackets) match up with what really gets echoed, so that input is also accepted.

Finally, the bit in red where "|" is entered is a bug on the developer side, and it gets (correctly) rejected on the user side due to having forked the proof chain. Currently stuck on this bug.

The code for this, especially on the developer side, is rather hairy, I wonder if I am missing a way to simplify it.

Posted
debug me half days

Two days only partially spent on debug-me..

Yesterday a few small improvements, but mostly I discovered the posix-pty library, and converted debug-me to use it rather than wrangling ptys itself. Which was nice because it let me fix resizing. However, the library had a bug with how it initializes the terminal, and investigating and working around that bug used up too much time. Oh well, probably still worth it.


Today, made debug-me serialize to and from JSON.

{"signature":{"v":""},"prevActivity":null,"activity":{"seenData":{"v":">>> debug-me session starting\r\n"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"fb4401a717f86958747d34f98c079eaa811d8af7d22e977d733f1b9e091073a6"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"$ "}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"cfc629125d93f55d2a376ecb9e119c89fe2cc47a63e6bc79588d6e7145cb50d2"},"hashMethod":"SHA256"},"activity":{"echoData":{"v":""},"enteredData":{"v":"l"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"cfc629125d93f55d2a376ecb9e119c89fe2cc47a63e6bc79588d6e7145cb50d2"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"l"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"3a0530c7739418e22f20696bb3798f8c3b2caf7763080f78bfeecc618fc5862e"},"hashMethod":"SHA256"},"activity":{"echoData":{"v":""},"enteredData":{"v":"s"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"3a0530c7739418e22f20696bb3798f8c3b2caf7763080f78bfeecc618fc5862e"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"s"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"91ac86c7dc2445c18e9a0cfa265585b55e01807e377d5f083c90ef307124d8ab"},"hashMethod":"SHA256"},"activity":{"echoData":{"v":""},"enteredData":{"v":"\r"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"91ac86c7dc2445c18e9a0cfa265585b55e01807e377d5f083c90ef307124d8ab"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"\r\n"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"cc97177983767a5ab490d63593011161e2bd4ac2fe00195692f965810e6cf3bf"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"AGPL\t    Pty.hs    Types.hs\t  debug-me.cabal  dist\r\nCmdLine.hs  Setup.hs  Val.hs\t  debug-me.hs\t  stack.yaml\r\n"}}}

That's a pretty verbose way of saying: I typed "ls" and saw the list of files. But it compresses well. Each packet for a single keystroke will take only 37 bytes to transmit as part of a compressed stream of JSON, and 32 of those bytes are needed for the SHA256 hash. So, this is probably good enough to use as debug-me's wire format.

(Some more bytes will be needed once the signature field is not empty..)

It's also a good logging format, and can be easily analized to eg, prove when a person used debug-me to do something bad.

Wrote a quick visualizor for debug-me logs using graphviz. This will be super useful for debug-me development if nothing else.

debug-me.log.png

Posted
debug me day 2

Proceeding as planned, I wrote 170 lines of code to make debug-me have separate threads for the user and developer sides, which send one-another updates to the activity chain, and check them for validity. This was fun to implement! And it's lacking only signing to be a full implementation of the debug-me proof chain.

Then I added a network latency simulation to it and tried different latencies up to the latency I measure on my satellite internet link (800 ms or so)

That helped me find two bugs, where it was not handling echo simulation correctly. Something is still not handled quite right, because when I put a network latency delay before sending output from the user side to the developer side, it causes some developer input to get rejected. So I'm for now only inserting latency when the developer is sending input to the user side. Good enough for proof-of-concept.

Result is that, even with a high latency, it feels "natural" to type commands into debug-me. The echo emulation works, so it accepts typeahead.

Using backspace to delete several letters in a row feels "wrong"; the synchronousness requirements prevent that working when latency is high. Same problem for moving around with the arrow keys. Down around 200 ms latency, these problems are not apparent, unless you mash down the backspace or arrow key.

How about using an editor? It seemed reasonably non-annoying at 200 ms latency, although here I do tend to mash down arrow keys and then it moves too fast for debug-me to keep up, and so the cursor movement stalls.

At higher latencies, using an editor was pretty annoying. Where I might normally press the down arrow key N distinct times to get to the line I wanted, that doesn't work in debug-me at 800 ms latency. Of course, over such a slow connection, using an editor is the last thing you want to do anyway, and vi key combos like 9j start to become necessary (and work in debug-me).

Based on these experiements, the synchronousness requirements are not as utterly annoying as I'd feared, especially at typical latencies.

And, it seems worth making debug-me detect when several keys are pressed close together, and send a single packet over the network combining those. That should make it behave better when mashing down a key.


Today's work was sponsored by Jake Vosloo on Patreon

Posted
debug me day 1

Started some exploratory programming on the debug-me idea.

First, wrote down some data types for debug-me's proof of developer activity.

Then, some terminal wrangling, to get debug-me to allocate a pseudo-terminal, run an interactive shell in it, and pass stdin and stdout back and forth to the terminal it was started in. At this point, debug-me is very similar to script, except it doesn't log the data it intercepts to a typescript file.

Terminals are complicated, so this took a while, and it's still not perfect, but good enough for now. Needs to have resize handling added, and for some reason when the program exits, the terminal is left in a raw state, despite the program apparently resetting its attributes.

Next goal is to check how annoying debug-me's insistence on a synchronous activity proof chain will be when using debug-me across a network link with some latency. If that's too annoying, the design will need to be changed, or perhaps won't work.

To do that, I plan to make debug-me simulate a network between the user and developer's processes, using threads inside a single process for now. The user thread will builds up an activity chain, and only accepts inputs from the developer thread when they meet the synchronicity requirements. Ran out of time to finish that today, so next time.

debug-me's git repository is available from https://git.joeyh.name/index.cgi/debug-me.git/


Today's work was sponsored by andrea rota.

Posted
propellor self bootstrap property

Worked for a while today on http://propellor.branchable.com/todo/property_to_install_propellor/, with the goal of making propellor build a disk image that itself contains propellor.

The hard part of that turned out to be that inside the chroot it's building, /usr/local/propellor is bind mounted to the one outside the chroot. But this new property needs to populate that directory in the chroot. Simply unmounting the bind mount would break later properties, so some way to temporarily expose the underlying directory was called for.

At first, I thought unshare -m could be used to do this, but for some reason that does not work in a chroot. Pity. Ended up going with a complicated dance, where the bind mount is bind mounted to a temp dir, then unmounted to expose the underlying directory, and once it's set up, the temp dir is re-bind-mounted back over it. Ugh.

I was able to reuse Propellor.Bootstrap to bootstrap propellor inside the chroot, which was nice.

Also nice that I'm able to work on this kind of thing at home despite it involving building chroots -- yay for satelite internet!


Today's work was sponsored by Riku Voipio.

Posted
git-annex devblog
day 456 digging in

Digging in to some of the meatier backlog today. Backlog down to 225.

A lot of fixes around using git annex enableremote to add new gpg keys to a gcrypt special remote.

Had to make git-annex's use of GIT_SSH/GIT_SSH_COMMAND contingent on GIT_ANNEX_USE_GIT_SSH=1 being set. Unfortunate, but difference from git made at least one existing use of that environment variable break, and so it will need to be whitelisted in places where git-annex should use it.

Added support for git annex add --update


Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 455 semi-synchronized

Catching up on some recent backlog after a trip and post-trip flu.

Anarcat wrote up an anlysis of semi-synchronized remotes, and based on that I implemented remote.<name>.annex-push and remote.<name>.annex-pull

Also fixed the Windows build.

Today's work was sponsored by Thomas Hochstein on Patreon.

Posted
git-annex devblog
day 454 multicast

Earlier this week I had the opportunity to sit in on a workshop at MIT where students were taught how to use git-annex as part of a stack of tools for reproducible scientific data research. That was great!

One thing we noticed there is, it can be hard to distribute files to such a class; downloading them individually wastes network bandwidth. Today, I added git annex multicast which uses uftp to multicast files to other clones of a repository on a LAN. An "easy" 500 lines of code and 7 hour job.

There is encryption and authentication, but the key management for this turned out to be simple, since the public key fingerprints can be stored on the git-annex branch, and easily synced around that way. So, I expect this should be not hard to use in a classroom setting such as the one I was in earlier this week.

Posted
git-annex devblog
day 453 release prep

Preparing for a release tomorrow. Yury fixed the Windows autobuilder over the weekend. The OSX autobuilder was broken by my changes Friday, which turned out to have a simple bug that took quite a long time to chase down.

Also added git annex sync --content-of=path to sync the contents of files in a path, rather than in the whole work tree as --content does. I would have rather made this be --content=path but optparse-applicative does not support options that can be either boolean or have a string value. Really, I'd rather git annex sync path do it, but that would be ambiguous with the remote name parameter.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 452 GIT SSH

Found a bug in git-annex-shell where verbose messages would sometimes make it output things git-annex didn't expect.

While fixing that, I wanted to add a test case, but the test suite actually does not test git-annex-shell at all. It would need to ssh, which test suites should not do. So, I took a detour..

Support for GIT_SSH and GIT_SSH_COMMAND has been requested before for various reasons. So I implemented that, which took 4 hours. (With one little possible compatability caveat, since git-annex needs to pass the -n parameter to ssh sometimes, and git's interface doesn't allow for such a parameter.)

Now the test suite can use those environment variables to make mock ssh remotes be accessed using local sh instead of ssh.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 451 annex.securehashesonly

The new annex.securehashesonly config setting prevents annexed content that does not use a cryptographically secure hash from being downloaded or otherwise added to a repository.

Using that and signed commits prevents SHA1 collisions from causing problems with annexed files. See using signed git commits for details about how to use it, and why I believe it makes git-annex safe despite git's vulnerability to SHA1 collisions in general.

If you are using git-annex to publish binary files in a repository, you should follow the instructions in using signed git commits.

If you're using git to publish binary files, you can improve the security of your repository by switchingto git-annex and signed commits.

Today's work was sponsored by Riku Voipio.

Posted
git-annex devblog
day 450 hardening against SHA attacks

Yesterday I said that a git-annex repository using signed commits and SHA2 backend would be secure from SHA1 collision attacks. Then I noticed that there were two ways to embed the necessary collision generation data inside git-annex key names. I've fixed both of them today, and cannot find any other ways to embed collision generation data in between a signed commit and the annexed files.

I also have a design for a way to configure git-annex to expect to see only keys using secure hash backends, which will make it easier to work with repositories that want to use signed commits and SHA2. Planning to implement that tomorrow.

sha1 collision embedding in git-annex keys has the details.

Posted
git-annex devblog
day 449 SHA1 break day

The first SHA1 collision was announced today, produced by an identical-prefix collision attack.

After looking into it all day, it does not appear to impact git's security immediately, except for targeted attacks against specific projects by very wealthy attackers. But we're well past the time when it seemed ok that git uses SHA1. If this gets improved into a chosen-prefix collision attack, git will start to be rather insecure.

Projects that store binary files in git, that might be worth $100k for an attacker to backdoor should be concerned by the SHA1 collisions. A good example of such a project is <git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git>.

Using git-annex (with a suitable backend like SHA256) and signed commits together is a good way to secure such repositories.

Update 12:25 am: However, there are some ways to embed SHA1-colliding data in the names of git-annex keys. That makes git-annex with signed commits be no more secure than git with signed commits. I am working to fix git-annex to not use keys that have such problems.

Posted
git-annex devblog
day 448 git push to update remote

Today was all about writing making a remote repo update when changes are pushed to it.

That's a fairly simple page, because I added workarounds for all the complexity of making it work in direct mode repos, adjusted branches, and repos on filesystems not supporting executable git hooks. Basically, the user should be able to set the standard receive.denyCurrentBranch=updateInstead configuration on a remote, and then git push or git annex sync should update that remote's working tree.

There are a couple of unhandled cases; git push to a remote on a filesystem like FAT won't update it, and git annex sync will only update it if it's local, not accessed over ssh. Also, the emulation of git's updateInstead behavior is not perfect for direct mode repos and adjusted branches.

Still, it's good enough that most users should find it meets their needs, I hope. How to set this kind of thing up is a fairly common FAQ, and this makes it much simpler.

(Oh yeah, the first ancient kernel arm build is still running. May finish before tomorrow.)

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 447 bug class

When you see a command like "ssh somehost rm -f file", you probably don't think that consumes stdin. After all, the rm -f doesn't. But, ssh can pass stdin over the network even if it's not being consumed, and it turns out git-annex was bitten by this.

That bug made git-annex-checkpresentkey --batch with remote accessed over ssh not see all the batch-mode input that was passed into it, because ssh sometimes consumed some of it.

Shell scripts using git-annex could also be impacted by the bug, for example:

#!/bin/sh
find . -type l -atime 100 | \
    while read file; do
        echo "gonna drop $file that has not been used in a while"
        git annex drop "$file"
    done

Depending on what remotes git annex drop talks to, it might consume parts of the output of find.

I've fixed this in git-annex now (using ssh -n when running commands that are not fed some stdin of their own), but this seems like a class of bug that could impact lots of programs that run ssh.


I've been thinking about ?simpler setup for remote worktree update on push.

One nice way to make a remote update its worktree on push is available in recent-ish gits, receive.denyCurrentBranch=updateInstead. That could already be used with git annex sync, but it hid any error messages when pushing the master branch to the remote (since that push fails with a large error message in default configurations). Found a way to make the error message be displayed when the remote's receive.denyCurrentBranch does not have the default configuration.

The remaining problem is that direct mode and adjusted branch remotes won't get their works trees updated even when configured that way. I am thinking about adding a post-update hook to support those.


Also continuing to bring up the ancient kernel arm autobuilder. It's running its first build now.

Today's work was sponsored by Riku Voipio.

Posted
git-annex devblog
day 446 quiet progress

Last week I only had energy to work most of each day on git-annex, or to blog about it. I chose quiet work. The changelog did grow a good amount.

Today, fixed some autobuilder problems, and I am gearing up to add another autobuild, targeting arm boxes with older linux kernels, since I got a chance to upgrade the arm autobuilder's disk this weekend.

Also, some work on the S3 special remote, and worked around a bug in sqlite's handling of umask.

Backlog is down to 243 messages.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 445 configs

Finished the repository-clone-global configuration settings I started adding on Monday. Came up with a nice type-driven way to make sure that configuration is loaded when needed, and only loaded once. Then it was easy to make annex.autocommit be configurable by git-annex config. Also added a new annex.synccontent configuration, which can also be set by git-annex config.

Also resolved a tricky situation with providing an appid to magic wormhole. It will happen on a flag day in 2021. I've marked my calendar..

Today's work was sponsored by Thomas Hochstein on Patreon.

Posted
git-annex devblog
day 444 memory leak with a cold

Spent rather too long today tracking down a memory leak in git annex unused. Actually, it was three memory leaks; one of them was a reversion introduced while otherwise improving a function to not be partial. Another only happened in very rare circumstances. The third, which took several more hours staring at the code, turned out to simply be an unnecessary use of an accumulating list. Feel like I should have seen that one sooner, but then I am under the weather and was running profiles in a daze for several hours.. In the end, git-annex unused went from needing 1 gb of memory to 150 mb in my big repo.

One advantage to all the profiling though, was I noticed that the split function was allocating a lot of memory, and seemed generally ineficient. This has to do with it splitting on a string; splitting on a single character can run twice as fast and churn the GC quite a bit less, so I wrote up a specialized version of that, and it's used extensively in git-annex now, so it may run up to 50% faster in some cases. Seems like haskell libraries with a split function should perhaps use the more optimal version when splitting on a single character, and I'm going to file bugs to that effect.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 443 yes it has been a while

First day working on git-annex in over a month. I've been away preparing for and giving two talks at Linux Conf Australia and then recovering from conference flu, but am now raring to dive back into git-annex development!

The backlog stood at over 300 messages this morning, and is down to 274 now. So still lots of catching up to do. But nothing seems to have blown up badly in my absence. The antipatterns page was a nice development while I was away, listing some ways people sometimes find to shoot their feet. Read and responded to lots of questions, including one user who mentioned a scientific use case: "We are exploring use of git-annex to manage the large boundary conditions used within our weather model."

The main bit of coding today was adding a new git annex config command. This is fairly similar to git config, but it stores the settings in the git-annex branch, so they're visible in all clones of the repo (aka "global"). Not every setting will be configurable this way (that would be too expensive, and too foot-shooty), but starting with annex.autocommit I plan to enable selected settings that make sense to be able to set globally. If you've wanted to be able to configure some part of git-annex in all clones of a repository, suggestions are welcome in the todo item about this

git annex vicfg can also be used to edit the global settings, and I also made it able to edit the global git annex numcopies setting which was omitted before. There's no real reason to have a separate git annex numcopies command now, since git annex config could configure global annex.numcopies.. but it's probably not worth changing that.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 442 xmpp removal

The webapp's wormhole pairing almost worked perfectly on the first test. Turned out the remotedaemon was not noticing that the tor hidden service got enabled. After fixing that, it worked perfectly!

So, I've merged that feature, and removed XMPP support from the assistant at the same time. If all goes well, the autobuilds will be updated soon, and it'll be released in time for new year's.

Anyone who's been using XMPP to keep repositories in sync will need to either switch to Tor, or could add a remote on a ssh server to sync by instead. See http://git-annex.branchable.com/assistant/share_with_a_friend_walkthrough/ for the pointy-clicky way to do it, and http://git-annex.branchable.com/tips/peer_to_peer_network_with_tor/ for the command-line way.

Posted
git-annex devblog
day 441 webapp wormhole pairing

Added the Magic Wormhole UI to the webapp for pairing Tor remotes. This replaces the XMPP pairing UI when using "Share with a friend" and "Share with your other devices" in the webapp.

I have not been able to fully test it yet, and it's part of the no-xmpp branch until I can.

It's been a while since I worked on the webapp. It was not as hard as I remembered to deal with Yesod. The inversion of control involved in coding for the web is as annoying as I remembered.


Today's work was sponsored by Riku Voipio.

Posted
git-annex devblog
day 440 holidaze

Have been working on some improvements to git annex enable-tor. Made it su to root, using any su-like program that's available. And made it test the hidden service it sets up, and wait until it's propigated the the Tor directory authorities. The webapp will need these features, so I thought I might as well add them at the command-line level.

Also some messing about with locale and encoding issues. About most of which the less said the better. One significant thing is that I've made the filesystem encoding be used for all IO by git-annex, rather than needing to explicitly enable it for each file and process. So, there should be much less bother with encoding problems going forward.

Posted
git-annex devblog
day 439 wormhole pairing

git annex p2p --pair implemented, using Magic Wormhole codes that have to be exchanged between the repositories being paired.

It looks like this, with the same thing being done at the same time in the other repository.

joey@elephant:~/tmp/bench3/a>git annex p2p --pair
p2p pair peer1 (using Magic Wormhole) 

This repository's pairing code is: 1-select-bluebird

Enter the other repository's pairing code: (here I entered 8-fascinate-sawdust) 
Exchanging pairing data...
Successfully exchanged pairing data. Connecting to peer1...
ok

And just that simply, the two repositories find one another, Tor onion addresses and authentication data is exchanged, and a git remote is set up connecting via Tor.

joey@elephant:~/tmp/bench3/a>git annex sync peer1
commit  
ok
pull peer1 
warning: no common commits
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (5/5), done.
From tor-annex::5vkpoyz723otbmzo.onion:61900
 * [new branch]      git-annex  -> peer1/git-annex

Very pleased with this, and also the whole thing worked on the very first try!

It might be slightly annoying to have to exchange two codes during pairing. It would be possible to make this work with only one code. I decided to go with two codes, even though it's only marginally more secure than one, mostly for UI reasons. The pairing interface and instructions for using it is simplfied by being symmetric.

(I also decided to revert the work I did on Friday to make p2p --link set up a bidirectional link. Better to keep --link the simplest possible primitive, and pairing makes bidirectional links more easily.)

Next: Some more testing of this and the Tor hidden services, a webapp UI for P2P peering, and then finally removing XMPP support. I hope to finish that by New Years.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 438 bi-directional p2p links

Improved git annex p2p --link to create a bi-directional link automatically. Bi-directional links are desirable more often than not, so it's the default behavior.

Also continued thinking about using magic wormhole for communicating p2p addresses for pairing. And filed some more bugs on magic wormhole.

Posted
git-annex devblog
day 437 catching up

Quite a backlog developed in the couple of weeks I was concentrating on tor support. I've taken a first pass through it and fixed the most pressing issues now.

Most important was an ugly memory corruption problem in the GHC runtime system that may have led to data corruption when using git-annex with Linux kernels older than 4.5. All the Linux standalone builds of git-annex have been updated to fix that issue.

Today dealt with several more things, including fixing a buggy timestamp issue with metadata --batch, reverting the ssh ServerAliveInterval setting (broke on too many systems with old ssh or complicated ssh configurations), making batch input not be rejected when it can't be decoded as UTF-8, and more.

Also, spent some time learning a little bit about Magic Wormhole and SPAKE, as a way to exchange tor remote addresses. Using Magic Wormhole for that seems like a reasonable plan. I did file a couple bugs on it which will need to get fixed, and then using it is mostly a question of whether it's easy enough to install that git-annex can rely on it.

Posted
git-annex devblog
day 435-436 post tor merge

More improvements to tor support. Yesterday, debugged a reversion that broke push/pull over tor, and made actual useful error messages be displayed when there were problems. Also fixed a memory leak, although I fixed it by reorganizing code and could not figure out quite why it happened, other than that the ghc runtime was not managing to be as lazy as I would expect.

Today, added git ref change notification to the P2P protocol, and made the remotedaemon automatically fetch changes from tor remotes. So, it should work to use the assistant to keep repositories in sync over tor. I have not tried it yet, and linking over tor still needs to be done at the command line, so it's not really ready for webapp users yet.

Also fixed a denial of service attack in git-annex-shell and git-annex when talking to a remote git-annex-shell. It was possible to feed either a large amount of data when they tried to read a line of data, and summon the OOM killer. Next release will be expedited some because of that.

Today's work was sponsored by Thomas Hochstein on Patreon.

Posted
git-annex devblog
day 434 it works

Git annex transfers over Tor worked correctly the first time I tried them today. I had been expecting protocol implementation bugs, so this was a nice surprise!

Of course there were some bugs to fix. I had forgotten to add UUID discovery to git annex p2p --link. And, resuming interrupted transfers was buggy.

Spent some time adding progress updates to the Tor remote. I was curious to see what speed transfers would run. Speed will of course vary depending on the Tor relays being used, but this example with a 100 mb file is not bad:

copy big4 (to peer1...) 
62%          1.5MB/s 24s

There are still a couple of known bugs, but I've merged the tor branch into master already.


Alpernebbi has built a GUI for editing git-annex metadata. Something I always wanted!
Read about it here


Today's work was sponsored by Ethan Aubin.

Posted
git-annex devblog
day 432-433 almost there

Friday and today were spent implementing both sides of the P2P protocol for git-annex content transfers.

There were some tricky cases to deal with. For example, when a file is being sent from a direct mode repository, or v6 annex.thin repository, the content of the file can change as it's being transferred. Including being appended to or truncated. Had to find a way to deal with that, to avoid breaking the protocol by not sending the indicated number of bytes of data.

It all seems to be done now, but it's not been tested at all, and there are probably some bugs to find. (And progress info is not wired up yet.)

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 431 p2p linking

Today I finished the second-to-last big missing peice for tor hidden service remotes. Networks of these remotes are P2P networks, and there needs to be a way for peers to find one-another, and to authenticate with one-another. The git annex p2p command sets up links between peers in such a network.

So far it has only a basic interface that sets up a one way link between two peers. In the first repository, run git annex p2p --gen-address. That outputs a long address. In the second repository, run git annex p2p --link peer1, and paste the address into it. That sets up a git remote named "peer1" that connects back to the first repository over tor.

That is a one-directional link, while a bi-directional link would be much more convenient to have between peers. Worse, the address can be reused by anyone who sees it, to link into the repository. And, the address is far too long to communicate in any way except for pasting it.

So I want to improve that later. What I'd really like to have is an interface that displays a one-time-use phrase of five to ten words, that can be read over the phone or across the room. Exchange phrases with a friend, and get your repositories securely linked together with tor.

But, git annex p2p is good enough for now. I can move on to the final keystone of the tor support, which is file transfer over tor. That should, fingers crossed, be relatively easy, and the tor branch is close to mergeable now.

Today's work was sponsored by Riku Voipio.

Posted
git-annex devblog
day 430 tor socket problem

Debian's tor daemon is very locked down in the directories it can read from, and so I've had a hard time finding a place to put the unix socket file for git-annex's tor hidden service. Painful details in http://bugs.debian.org/846275. At least for now, I'm putting it under /etc/tor/, which is probably a FHS violation, but seems to be the only option that doesn't involve a lot of added complexity.


The Windows autobuilder is moving, since NEST is shutting down the server it has been using. Yury Zaytsev has set up a new Windows autobuilder, hosted at Dartmouth College this time.

Posted
git-annex devblog
day 428-429 git push to hiddden service

The tor branch is coming along nicely.

This weekend, I continued working on the P2P protocol, implementing it for network sockets, and extending it to support connecting up git-send-pack/git-receive-pack.

There was a bit of a detour when I split the Free monad into two separate ones, one for Net operations and the other for Local filesystem operations.

This weekend's work was sponsored by Thomas Hochstein on Patreon.


Today, implemented a git-remote-tor-annex command that git will use for tor-annex:: urls, and made git annex remotedaemon serve the tor hidden service.

Now I have git push/pull working to the hidden service, for example:

git pull tor-annex::eeaytkuhaupbarfi.onion:47651

That works very well, but does not yet check that the user is authorized to use the repo, beyond knowing the onion address. And currently it only works in git-annex repos; with some tweaks it should also work in plain git repos.

Next, I need to teach git-annex how to access tor-annex remotes. And after that, an interface in the webapp for setting them up and connecting them together.

Today's work was sponsored by Josh Taylor on Patreon.

Posted
git-annex devblog
day 427 free p2p

For a Haskell programmer, and day where a big thing is implemented without the least scrap of code that touches the IO monad is a good day. And this was a good day for me!

Implemented the p2p protocol for tor hidden services. Its needs are somewhat similar to the external special remote protocol, but the two protocols are not fully overlapping with one-another. Rather than try to unify them, and so complicate both cases, I prefer to reuse as much code as possible between separate protocol implementations. The generating and parsing of messages is largely shared between them. I let the new p2p protocol otherwise develop in its own direction.

But, I do want to make this p2p protocol reusable for other types of p2p networks than tor hidden services. This was an opportunity to use the Free monad, which I'd never used before. It worked out great, letting me write monadic code to handle requests and responses in the protocol, that reads the content of files and resumes transfers and so on, all independent of any concrete implementation.

The whole implementation of the protocol only needed 74 lines of monadic code. It helped that I was able to factor out functions like this one, that is used both for handling a download, and by the remote when an upload is sent to it:

receiveContent :: Key -> Offset -> Len -> Proto Bool
receiveContent key offset len = do
        content <- receiveBytes len
        ok <- writeKeyFile key offset content
        sendMessage $ if ok then SUCCESS else FAILURE
        return ok

To get transcripts of the protocol in action, the Free monad can be evaluated purely, providing the other side of the conversation:

ghci> putStrLn $ protoDump $ runPure (put (fromJust $ file2key "WORM--foo")) [PUT_FROM (Offset 10), SUCCESS]
> PUT WORM--foo
< PUT-FROM 10
> DATA 90
> bytes
< SUCCESS
result: True

ghci> putStrLn $ protoDump $ runPure (serve (toUUID "myuuid")) [GET (Offset 0) (fromJust $ file2key "WORM--foo")]
< GET 0 WORM--foo
> PROTO-ERROR must AUTH first
result: ()

Am very happy with all this pure code and that I'm finally using Free monads. Next I need to get down the the dirty business of wiring this up to actual IO actions, and an actual network connection.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 426 grab bag

Fixed one howler of a bug today. Turns out that git annex fsck --all --from remote didn't actually check the content of the remote, but checked the local repository. Only --all was buggy; git annex fsck --from remote was ok. Don't think this is crash priority enough to make a release for, since only --all is affected.

Somewhat uncomfortably made git annex sync pass --allow-unrelated-histories to git merge. While I do think that git's recent refusal to merge unrelated histories is good in general, the problem is that initializing a direct mode repository involves making an empty commit. So merging from a remote into such a direct mode repository means merging unrelated histories, while an indirect mode repository doesn't. Seems best to avoid such inconsistencies, and the only way I could see to do it is to always use --allow-unrelated-histories. May revisit this once direct mode is finally removed.

Using the git-annex arm standalone bundle on some WD NAS boxes used to work, and then it seems they changed their kernel to use a nonstandard page size, and broke it. This actually seems to be a bug in the gold linker, which defaults to an unncessarily small page size on arm. The git-annex arm bundle is being adjusted to try to deal with this.

ghc 8 made error include some backtrace information. While it's really nice to have backtraces for unexpected exceptions in Haskell, it turns out that git-annex used error a lot with the intent of showing an error message to the user, and a backtrace clutters up such messages. So, bit the bullet and checked through every error in git-annex and made such ones not include a backtrace.

Also, I've been considering what protocol to use between git-annex nodes when communicating over tor. One way would be to make it very similar to git-annex-shell, using rsync etc, and possibly reusing code from git-annex-shell. However, it can take a while to make a connection across the tor network, and that method seems to need a new connection for each file transfered etc. Also thought about using a http based protocol. The servant library is great for that, you get both http client and server implementations almost for free. Resuming interrupted transfers might complicate it, and the hidden service side would need to listen on a unix socket, instead of the regular http port. It might be worth it to use http for tor, if it could be reused for git-annex http servers not on the tor network. But, then I'd have to make the http server support git pull and push over http in a way that's compatable with how git uses http, including authentication. Which is a whole nother ball of complexity. So, I'm leaning instead to using a simple custom protocol something like:

    > AUTH $localuuid $token
    < AUTH-SUCCESS $remoteuuid
    > SENDPACK $length
    > $gitdata
    < RECVPACK $length
    < $gitdata
    > GET $pos $key
    < DATA $length
    < $bytes
    > SUCCESS
    > PUT $key
    < PUT-FROM $pos
    > DATA $length
    > $bytes
    < SUCCESS

Today's work was sponsored by Riku Voipio.

Posted
git-annex devblog
day 425 tor

Have waited too long for some next-generation encrypted P2P network, like telehash to emerge. Time to stop waiting; tor hidden services are not as cutting edge, but should work. Updated the design and started implementation in the tor branch.

Unfortunately, Tor's default configuration does not enable the ControlPort. And, changing that in the configuration could be problimatic. This makes it harder than it ought to be to register a tor hidden service. So, I implemented a git annex enable-tor command, which can be run as root to set it up. The webapp will probably use su-to-root or gksu to run it. There's some Linux-specific parts in there, and it uses a socket for communication between tor and the hidden service, which may cause problems for Windows porting later.

Next step will be to get git annex remotedaemon to run as a tor hidden service.

Also made a no-xmpp branch which removes xmpp support from the assistant. That will remove 3000 lines of code when it's merged. Will probably wait until after tor hidden services are working.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
the dog that didn't bark

Worked on several bug reports today, fixing some easy ones, and following up on others. And then there are the hard bugs.. Very pleased that I was able to eventually reproduce a bug based entirely on the information that git-annex's output did not include a filename. Didn't quite get that bug fixed though.

At the end of the day, got a bug report that git annex add of filenames containing spaces has broken. This is a recent reversion and I'm pushing out a release with a fix ASAP.

Posted
git-annex devblog
day 423 ssh fun

Made a significant change today: Enabled automatic retrying of transfers that fail. It's only done if the previous try managed to advance the progress by some amount. The assistant has already had that retrying for many years, but now it will also be done when using git-annex at the command line.

One good reason for a transfer to fail and need a retry is when the network connection stalls. You'd think that TCP keepalives would detect this kind of thing and kill the connection but I've had enough complaints, that I suppose that doesn't always work or gets disabled. Ssh has a ServerAliveInterval that detects such stalls nicely for the kind of batch transfers git-annex uses ssh for, but it's not enabled by default. So I found a way to make git-annex enable it, while still letting ~/.ssh/config settings override that.

Also got back to analizing an old bug report about proliferating ".nfs*.lock" files when using git-annex on nfs; this was caused by the wacky NFS behavior of renaming deleted files, and I found a change to the ssh connection caching cleanup code that should avoid the problem.

Posted
git-annex devblog
day 422 bugfixes for v6 mode

Several bug fixes involving v6 unlocked files today. Several related bugs were caused by relying on the inode cache information, without a fallback to handle the case where the inode cache had not gotten updated. While the inode cache is generally kept up-to-date well by the smudge/clean filtering, it is just a cache and can be out of date. Did some auditing for such problems and hopefully I've managed to find them all.

Also, there was a tricky upgrade case where a v5 repository contained a v6 unlocked file, and the annexed content got copied into it. This triggered the above-described bugs, and in this case the worktree needs to be updated on upgrade, to replace the pointer file with the content.

As I caught up with recent activity, it was nice to see some contributions from others. James MacMahon sent in a patch to improve the filenames generated by importfeed. And, xloem is writing workflow documentation for git-annex in Workflow guide.

Posted
git-annex devblog
day 421 lost in the trees

Finished up where I left off yesterday, writing test cases and fixing bugs with syncing in adjusted branches. While adjusted branches need v6 mode, and v6 mode is still considered experimental, this is still a rather nasty bug, since it can make files go missing (though still available in git history of course). So, planning to release a new version with these fixes as soon as the autobuilders build it.

Posted
git-annex devblog
day 420 delayed debugging

Over a month ago, I had some reports that syncing into adjusted branches was losing some files that had been committed. I couldn't reproduce it, but IIRC both felix and tbm reported problems in this area. And, felix kindly sent me enough of his git repo to hopefully reproduce it the problem.

Finally got back to that today. Luckily, I was able to reproduce the bug using felix's repo. The bug only occurs when there's a change deep in a tree of an adjusted branch, and not always then. After staring at it for a couple of hours, I finally found the problem; a modification flag was not getting propagated in this case, and some changes made deep in the tree were not getting included into parent trees.

So, I think I've fixed it, but need to look at it some more to be sure, and develop a test case. And fixing that exposed another bug in the same code. Gotta run unfortunately, so will finish this tomorrow..

Today's work was sponsored by Riku Voipio.

Posted
git-annex devblog
day 419 catching up

Several bug fixes today and got caught up on most recent messages. Backlog is 157.

The most significant one prevents git-annex from reading in the whole content of a large git object when it wants to check if it's an annex symlink. In several situations where large files were committed to git, or staged, git-annex could do a lot of work, and use a lot of memory and maybe crash. Fixed by checking the size of an object before asking git cat-file for its content.

Also a couple of improvements around versions and upgrading. IIRC git-annex used to only support one repository version at a time, but this was changed to support V6 as an optional upgrade from V5, and so the supported versions became a list. Since V3 repositories are identical to V5 other than the version, I added it to the supported version list, and any V3 repos out there can be used without upgading. Particularly useful if they're on read-only media.

And, there was a bug in the automatic upgrading of a remote that caused it to be upgraded all the way to V6. Now it will only be upgraded to V5.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 418 concurrent externals

Realized recently that despite all the nice concurrency support in git-annex, external special remotes were limited to handling one request at a time.

While the external special remote prococol could almost support concurrent requests, that would complicate implementing them, and probably need a version flag to enable to avoid breaking existing ones.

Instead, made git-annex start up multiple external special remote processes as needed to handle concurrency.

Today's work was sponsored by Josh Taylor on Patreon.

Posted
git-annex devblog
day 417 cut once

Did most of the optimisations that recent profiling suggested. This sped up a git annex find from 3.53 seconds to 1.73 seconds. And, git annex find --not --in remote from 12.41 seconds to 5.24 seconds. One of the optimisations sped up git-annex branch querying by up to 50%, which should also speed up use of some preferred content expressions. All in all, a very nice little optimisation pass.

Posted
git-annex devblog
day 416 measure twice

Only had a couple hours today, which were spent doing some profiling of git-annex in situations where it has to look through a large working tree in order to find files to act on. The top five hot spots this found are responsible for between 50% and 80% of git-annex's total CPU use in these situations.

The first optimisation sped up git annex find by around 18%. More tomorrow..

Posted
git-annex devblog
day 415 catching up

Catching up on backlog today. I hope to be back to a regular work schedule now. Unanswered messages down to 156. A lot of time today spent answering questions.

There were several problems involving git branches with slashes in their name, such as "foo/bar" (but not "origin/master" or "refs/heads/foo"). Some branch names based on such a branch would take only the "bar" part. In git annex sync, this led to perhaps merging "foo/bar" into "other/bar" or "bar". And the adjusted branch code was entirely broken for such branches. I've fixed it now.

Also made git annex addurl behave better when the file it wants to add is gitignored.

Thinking about implementing git annex copy --from A --to B. It does not seem too hard to do that, at least with a temp file used inbetween. See transitive transfers.

Today's work was sponsored by Thomas Hochstein on Patreon.

Posted
git-annex devblog
day 414 improved parallel get

Turned out to not be very hard at all to make git annex get -JN assign different threads to different remotes that have the same cost. Something like that was requested back in 2011, but it didn't really make sense until parallel get was implemented last year.

(Also spent too much time fixing up broken builds.)

Posted
git-annex devblog
day 413 back

Back after taking most of August off and working on other projects.

Got the unanswered messages backlog down from 222 to 170. Still scary high.

Numerous little improvements today. Notable ones:

  • Windows: Handle shebang in external special remote program. This is needed for git-annex-remote-rclone to work on Windows. Nice to see that external special remote is getting ported and apparently lots of use.
  • Make --json and --quiet suppress automatic init messages, and any other messages that might be output before a command starts. This was a reversion introduced in the optparse-applicative changes over a year ago.

Also I'm developing a plan to improve parallel downloading when multiple remotes have the same cost. See get round robin.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
if at first you don't succeed..

A user suggested adding --failed to retry failed transfers. That was a great idea and I landed a patch for it 3 hours later. Love it when a user suggests something so clearly right and I am able to quickly make it happen!


Unfortunately, my funding from the DataLad project to work on git-annex is running out. It's been a very good two years funded that way, with an enormous amount of improvements and support and bug fixes, but all good things must end. I'll continue to get some funding from them for the next year, but only for half as much time as the past two years.

I need to decide it it makes sense to keep working on git-annex to the extent I have been. There are definitely a few (hundred) things I still want to do on git-annex, starting with getting the git patches landed to make v6 mode really shine. Past that, it's mostly up to the users. If they keep suggesting great ideas and finding git-annex useful, I'll want to work on it more.

What to do about funding? Maybe some git-annex users can contribute a small amount each month to fund development. I've set up a Patreon page for this, https://www.patreon.com/joeyh


Anyhoo... Back to today's (unfunded) work.

--failed can be used with get, move, copy, and mirror. Of course those commands can all be simply re-ran if some of the transfers fail and will pick up where they left off. But using --failed is faster because it does not need to scan all files to find out which still need to be transferred. And accumulated failures from multiple commands can be retried with a single use of --failed.

It's even possible to do things like git annex get --from foo; git annex get --failed --from bar, which first downloads everything it can from the foo remote and falls back to using the bar remote for the rest. Although setting remote costs is probably a better approach most of the time.

Turns out that I had earlier disabled writing failure log files, except by the assistant, because only the assistant was using them. So, that had to be undone. There's some potential for failure log files to accumulate annoyingly, so perhaps some expiry mechanism will be needed. This is why --failed is documented as retrying "recent" transfers. Anyway, the failure log files are cleaned up after successful transfers.

Posted
git-annex devblog
day 411 metadata --batch

With yesterday's JSON groundwork in place, I quickly implemented git annex metadata --batch today in only 45 LoC. The interface is nicely elegant; the same JSON format that git-annex metadata outputs can be fed into it to get, set, delete, and modify metadata.

Posted
git-annex devblog
day 410 better JSON for metadata

I've had to change the output of git annex metadata --json. The old output looked like this:

{"command":"metadata","file":"foo","key":"...","author":["bar"],...,"note":"...","success":true}

That was not good, because it didn't separate the metadata fields from the rest of the JSON object. What if a metadata field is named "note" or "success"? It would collide with the other "note" and "success" in the JSON.

So, changed this to a new format, which moves the metadata fields into a "fields" object:

{"command":"metadata","file":"foo","key":"...","fields":{"author":["bar"],...},"note":"...","success":true}

I don't like breaking backwards compatability of JSON output, but in this case I could see no real alternative. I don't know if anyone is using metadata --batch anyway. If you are and this will cause a problem, get in touch.


While making that change, I also improved the JSON output layer, so it can use Aeson. Update: And switched everything over to using Aeson, so git-annex no longer depends on two different JSON libraries.

This let me use Aeson to generate the "fields" object for metadata --json. And it was also easy enough to use Aeson to parse the output of that command (and some simplified forms of it).

So, I've laid the groundwork for git annex metadata --batch today.

Posted
git-annex devblog
day 409 --branch

A common complaint is that git annex fsck in a bare repository complains about missing content of deleted files. That's because in a bare repository, git-annex operates on all versions of all files. Today I added a --branch option, so if you only want to check say, the master branch, you can: git annex fsck --branch master

The new option has other uses too. Want to get all the files in the v1.0 tag? git annex get --branch v1.0

It might be worth revisiting the implicit --all behavior for bare repositories. It could instead default to --branch HEAD or something like that. But I'd only want to change that if there was a strong consensus in favor.

Over 3/4th of the time spent implementing --branch was spent in adjusting the output of commands, to show "branch:file" is being operated on. How annoying.

Posted
git-annex devblog
day 408 release day

First release in over a month. Before making this release, a few last minute fixes, including a partial workaround for the problem that Sqlite databases don't work on Lustre filesystems.

Backlog is now down to 140 messages, and only 3 of those are from this month. Still higher than I like.

Posted
git-annex devblog
day 407 lazy sunday

Noticed that in one of my git-annex repositories, git-annex was spending a full second at startup checking all the git-annex branches from remotes to see if they contained changes that needed to be merged in. So, I added a cache of recently merged branches to avoid that. I remember considering this optimisation years ago; don't know why I didn't do it then. Not every day that I can speed up git-annex so much!

Also, made git annex log --all show location log changes for all keys. This was tricky to get right and fast.

Posted
git-annex devblog
day 405 more git development

Revisited my enhanced smudge/clean patch set for git, updating it for code review and to deal with changes in git since I've been away. This took several hours unfortunately.

Posted
git-annex devblog
day 404 low hanging fruit

Back from vacation, with a message backlog of 181. I'm concentrating first on low-hanging fruit of easily implemented todos, and well reproducible bugs, to get started again.

Implemented --batch mode for git annex get and git annex drop, and also enabled --json for those.

Investigated git-annex startup time; see http://git-annex.branchable.com/todo/could_standalone___39__fixed__39___git-annex_binaries_be_prelinked__63__/. Turns out that cabal has a bug that causes many thousands of unnecessary syscalls when linking in the shared libraries. Working around it halved git-annex's startup time.

Fixed a bug that caused git annex testremote to crash when testing a freshly made external special remote.

Posted
git-annex devblog
day 403 update and away

Continued working on the enhanced smudge/clean interface in git today. Sent in a third version of the patch set, which is now quite complete.

I'll be away for the next week and a half, on vacation.

Posted
git-annex devblog
day 402 enhanced smudge clean interface

Continued working on the enhancaed smudge/clean interface in git, incorporating feedback from the git developers.

In a spare half an hour, I made an improved-smudge-filters branch that teaches git-annex smudge to use the new interface.

Doing a quick benchmark, git checkout of a deleted 1 gb file took:

  • 19 seconds before
  • 11 seconds with the new interface
  • 0.1 seconds with the new interface and annex.thin set
    (while also saving 1 gb of disk space!)

So, this new interface is very much worthwhile.

Posted
git-annex devblog
day 400-401 git development

Working on git, not git-annex the past two days, I have implemented the smudge-to-file/clean-from-file extension to the smudge/clean filter interface. Patches have been sent to the git developers, and hopefully they'll like it and include it. This will make git-annex v6 work a lot faster and better.

Amazing how much harder it is to code on git than on git-annex! While I'm certianly not as familiar with the git code base, this is mostly because C requires so much more care about innumerable details and so much verbosity to do anything. I probably could have implemented this interface in git-annex in 2 hours, not 2 days.

Posted
git-annex devblog
day 399 weird git merge bug

There was one more test suite failure when run on FAT, which I've investigated today. It turns out that a bug report was filed about the same problem, and at root it seems to be a bug in git merge. Luckily, it was not hard to work around the strange merge behavior.

It's been very worthwhile running the test suite on FAT; it's pointed me at several problems with adjusted branches over the past weeks. It would be good to add another test suite pass to test adjusted branches explicitly, but when I tried adding that, there were a lot of failures where the test suite is confused by adjusted branch behavior and would need to be taught about it.

I've released git-annex 6.20160613. If you're using v6 repositories and especially adjusted branches, you should upgrade since it has many fixes.

Posted
git-annex devblog
day 398 fresh eyes

Today I was indeed able to get to the bottom of and fix the bug that had stumped me the other day.

Rest of the day was taken up by catching up to some bug requests and suggestions for v6 mode. Like making unlock and lock work for files that are not locally present. And, improving the behavior of the clean filter so it remembers what backend was used for a file before and continues using that same backend.

About ready to make a release, but IIRC there's one remaining test suite failure on FAT.

Posted
git-annex devblog
day 397 befuddled

Been having a difficult time fixing the two remaining test suite failures when run on a FAT filesystem.

On Friday, I got quite lost trying to understand the first failure. At first I thought it had something to do with queued git staging commands not being run in the right git environment when git-annex is using a different index file or work tree. I did find and fix a potential bug in that area. It might be that some reports long ago of git-annex branch files getting written to the master branch was caused by that. But, fixing it did not help with the test suite failure at hand.

Today, I quickly found the actual cause of the first failure. Of course, it had nothing to do with queued git commands at all, and was a simple fix in the end.

But, I've been staring at the second failure for hours and am not much wiser. All I know is, an invalid tree object gets generated by the adjusted branch code that contains some files more than once. (git gets very confused when a repository contains such tree objects; if you wanted to break a git repository, getting such trees into it might be a good way. cough) This invalid tree object seems to be caused by the basis ref for the adjusted branch diverging somehow from the adjusted branch itself. I have not been able to determine why or how the basis ref can diverge like that.

Also, this failure is somewhat indeterminite, doesn't always occur and reordering the tests in the test suite can hide it. Weird.

Well, hopefully looking at it again later with fresh eyes will help.

Posted
git-annex devblog
day 396 misc fixes

A productive day of small fixes. Including a change to deal with an incompatibility in git 2.9's commit.gpgsign, and couple of fixes involving gcrypt repositories.

Also several improvements to cloning from repositories where an adjusted branch is checked out. The clone automatically ends up with the adjusted branch checked out too.

The test suite has 3 failures when run on a FAT repository, all involving adjusted branches. Managed to fix one of them today, hope to get to the others soon.

Posted
git-annex devblog
day 395 leaky abstractions

Release today includes a last-minute fix to parsing lines from the git-annex branch that might have one or more carriage returns at the end. This comes from Windows of course, where since some things transparently add/remove \r before the end of lines, while other things don't, it could result in quite a mess. Luckily it was not hard or expensive to handle. If you are lucky enough not to use Windows, the release also has several more interesting improvements.

Posted
git-annex devblog
day 394 implicit vs explicit

git-annex has always balanced implicit and explicit behavior. Enabling a git repository to be used with git-annex needs an explicit init, to avoid foot-shooting; but a clone of a repository that is already using git-annex will be implicitly initialized. Git remotes implicitly are checked to see if they use git-annex, so the user can immediately follow git remote add with git annex get to get files from it.

There's a fine line here, and implicit git remote enabling sometimes crosses it; sometimes the remote doesn't have git-annex-shell, and so there's an ugly error message and annex-ignore has to be set to avoid trying to enable that git remote again. Sometimes the probe of a remote can occur when the user doesn't really expect it to (and it can involve a ssh password prompt).

Part of the problem is, there's not an explicit way to enable a git remote to be used by git-annex. So, today, I made git annex enableremote do that, when the remote name passed to it is a git remote rather than a special remote. This way, you can avoid the implicit behavior if you want to.

I also made git annex enableremote un-set annex-ignore, so if a remote got that set due to a transient configuration problem, it can be explicitly enabled.

Posted
git-annex devblog
day 393 fun and more fun

Over the weekend, I noticed that a relative path to GIT_INDEX_FILE is interpreted in several different, inconsistent ways by git. git-annex mostly used absolute paths, but did use a relative path in git annex view. Now it will only use absolute paths to avoid git's wacky behavior.

Integrated some patches to support building with ghc 8.0.1, which was recently released.

The gnupg-options git configs were not always passed to gpg. Fixing this involved quite a lot of plumbing to get the options to the right functions, and consumed half of today.

Also did some design work on external special remote protocol to avoid backwards compatability problems when adding new protocol features.

Posted
git-annex devblog
day 392 v6 fixes

Fixed several problems with v6 mode today. The assistant was doing some pretty wrong things when changes were synced into v6 repos, and that behavior is fixed. Also dealt with a race that caused updates made to the keys database by one process to not be seen by another process. And, made git annex add of a unlocked pointer file not annex the pointer file's content, but just add it to git as-is.

Also, Thowz pointed out that adjusted branches could be used to locally adjust where annex symlinks point to, when a repository's git directory is not in the usual location. I've added that, as git annex adjust --fix. It was quite easy to implement this, which makes me very happy with the adjusted branches code!

Posted
git-annex devblog
day 390 sharedpubkey

It's not every day I add a new special remote encryption mode to git-annex! The new encryption=sharedpubkey mode lets anyone with a clone of the git repository (and access to the remote) store files in the remote, but then only the private key owner can access those files. Which opens up some interesting new use cases...

Posted
git-annex devblog
day 388-389 various and windows

Lots of little fixes and improvements here and there over the past couple days.

The main thing was fixing several bugs with adjusted branches and Windows. They seem to work now, and commits made on the adjusted branch are propigated back to master correctly.

It would be good to finish up the last todos for v6 mode this month. The sticking point is I need a way to update the file stat in the git index when git-annex gets/drops/etc an unlocked file. I have not decided yet if it makes the most sense to add a dependency on libgit2 for that, or extend git update-index, or even write a pure haskell library to manipulate index files. Each has its pluses and its minuses.

Posted
git-annex devblog
day 387 release day

git-annex 6.20160419 has a rare security fix. A bug made encrypted special remotes that are configured to use chunks accidentally expose the checksums of content that is uploaded to the remote. Such information is supposed to be hidden from the remote's view by the encryption. The same bug also made resuming interrupted uploads to such remotes start over from the beginning.

After releasing that, I've been occupied today with fixing the Android autobuilder, which somehow got its build environment broken (unsure how), and fixing some other dependency issues.

Posted
type safe multi-OS Propellor

Propellor was recently ported to FreeBSD, by Evan Cofsky. This new feature led me down a two week long rabbit hole to make it type safe. In particular, Propellor needed to be taught that some properties work on Debian, others on FreeBSD, and others on both.

The user shouldn't need to worry about making a mistake like this; the type checker should tell them they're asking for something that can't fly.

-- Is this a Debian or a FreeBSD host? I can't remember, let's use both package managers!
host "example.com" $ props
    & aptUpgraded
    & pkgUpgraded

As of propellor 3.0.0 (in git now; to be released soon), the type checker will catch such mistakes.

Also, it's really easy to combine two OS-specific properties into a property that supports both OS's:

upgraded = aptUpgraded `pickOS` pkgUpgraded

type level lists and functions

The magick making this work is type-level lists. A property has a metatypes list as part of its type. (So called because it's additional types describing the type, and I couldn't find a better name.) This list can contain one or more OS's targeted by the property:

aptUpgraded :: Property (MetaTypes '[ 'Targeting 'OSDebian, 'Targeting 'OSBuntish ])

pkgUpgraded :: Property (MetaTypes '[ 'Targeting 'OSFreeBSD ])

In Haskell type-level lists and other DataKinds are indicated by the ' if you have not seen that before. There are some convenience aliases and type operators, which let the same types be expressed more cleanly:

aptUpgraded :: Property (Debian + Buntish)

pkgUpgraded :: Property FreeBSD

Whenever two properties are combined, their metatypes are combined using a type-level function. Combining aptUpgraded and pkgUpgraded will yield a metatypes that targets no OS's, since they have none in common. So will fail to type check.

My implementation of the metatypes lists is hundreds of lines of code, consisting entirely of types and type families. It includes a basic implementation of singletons, and is portable back to ghc 7.6 to support Debian stable. While it takes some contortions to support such an old version of ghc, it's pretty awesome that the ghc in Debian stable supports this stuff.

extending beyond targeted OS's

Before this change, Propellor's Property type had already been slightly refined, tagging them with HasInfo or NoInfo, as described in making propellor safer with GADTs and type families. I needed to keep that HasInfo in the type of properties.

But, it seemed unnecessary verbose to have types like Property NoInfo Debian. Especially if I want to add even more information to Property types later. Property NoInfo Debian NoPortsOpen would be a real mouthful to need to write for every property.

Luckily I now have this handy type-level list. So, I can shove more types into it, so Property (HasInfo + Debian) is used where necessary, and Property Debian can be used everywhere else.

Since I can add more types to the type-level list, without affecting other properties, I expect to be able to implement type-level port conflict detection next. Should be fairly easy to do without changing the API except for properties that use ports.

singletons

As shown here, pickOS makes a property that decides which of two properties to use based on the host's OS.

aptUpgraded :: Property DebianLike
aptUpgraded = property "apt upgraded" (apt "upgrade" `requires` apt "update")

pkgUpgraded :: Property FreeBSD
pkgUpgraded = property "pkg upgraded" (pkg "upgrade")
    
upgraded :: Property UnixLike
upgraded = (aptUpgraded `pickOS` pkgUpgraded)
    `describe` "OS upgraded"

Any number of OS's can be chained this way, to build a property that is super-portable out of simple little non-portable properties. This is a sweet combinator!

Singletons are types that are inhabited by a single value. This lets the value be inferred from the type, which came in handy in building the pickOS property combinator.

Its implementation needs to be able to look at each of the properties at runtime, to compare the OS's they target with the actial OS of the host. That's done by stashing a target list value inside a property. The target list value is inferred from the type of the property, thanks to singletons, and so does not need to be passed in to property. That saves keyboard time and avoids mistakes.

is it worth it?

It's important to consider whether more complicated types are a net benefit. Of course, opinions vary widely on that question in general! But let's consider it in light of my main goals for Propellor:

  1. Help save the user from pushing a broken configuration to their machines at a time when they're down in the trenches dealing with some urgent problem at 3 am.
  2. Advance the state of the art in configuration management by taking advantage of the state of the art in strongly typed haskell.

This change definitely meets both criteria. But there is a tradeoff; it got a little bit harder to write new propellor properties. Not only do new properties need to have their type set to target appropriate systems, but the more polymorphic code is, the more likely the type checker can't figure out all the types without some help.

A simple example of this problem is as follows.

foo :: Property UnixLike
foo = p `requires` bar
  where
    p = property "foo" $ do
        ...

The type checker will complain that "The type variable ‘metatypes1’ is ambiguous". Problem is that it can't infer the type of p because many different types could be combined with the bar property and all would yield a Property UnixLike. The solution is simply to add a type signature like p :: Property UnixLike

Since this only affects creating new properties, and not combining existing properties (which have known types), it seems like a reasonable tradeoff.

things to improve later

There are a few warts that I'm willing to live with for now...

Currently, Property (HasInfo + Debian) is different than Property (Debian + HasInfo), but they should really be considered to be the same type. That is, I need type-level sets, not lists. While there's a type level sets library for hackage, it still seems to require a specific order of the set items when writing down a type signature.

Also, using ensureProperty, which runs one property inside the action of another property, got complicated by the need to pass it a type witness.

foo = Property Debian
foo = property' $ \witness -> do
    ensureProperty witness (aptInstall "foo")

That witness is used to type check that the inner property targets every OS that the outer property targets. I think it might be possible to store the witness in the monad, and have ensureProperty read it, but it might complicate the type of the monad too much, since it would have to be parameterized on the type of the witness.

Oh no, I mentioned monads. While type level lists and type functions and generally bending the type checker to my will is all well and good, I know most readers stop reading at "monad". So, I'll stop writing. ;)

thanks

Thanks to David Miani who answered my first tentative question with a big hunk of example code that got me on the right track.

Also to many other people who answered increasingly esoteric Haskell type system questions.

Also thanks to the Shuttleworth foundation, which funded this work by way of a Flash Grant.

letsencrypt support in propellor

I've integrated letsencrypt into propellor today.

I'm using the reference letsencrypt client. While I've seen complaints that it has a lot of dependencies and is too complicated, it seemed to only need to pull in a few packages, and use only a few megabytes of disk space, and it has fewer options than ls does. So seems fine. (Although it would be nice to have some alternatives packaged in Debian.)

I ended up implementing this:

letsEncrypt :: AgreeTOS -> Domain -> WebRoot -> Property NoInfo

This property just makes the certificate available, it does not configure the web server to use it. This avoids relying on the letsencrypt client's apache config munging, which is probably useful for many people, but not those of us using configuration management systems. And so avoids most of the complicated magic that the letsencrypt client has a reputation for.

Instead, any property that wants to use the certificate can just use leteencrypt to get it and set up the server when it makes a change to the certificate:

letsEncrypt (LetsEncrypt.AgreeTOS (Just "me@my.domain")) "example.com" "/var/www"
    `onChange` setupthewebserver

(Took me a while to notice I could use onChange like that, and so divorce the cert generation/renewal from the server setup. onChange is awesome! This blog post has been updated accordingly.)

In practice, the http site has to be brought up first, and then letsencrypt run, and then the cert installed and the https site brought up using it. That dance is automated by this property:

Apache.httpsVirtualHost "example.com" "/var/www"
    (LetsEncrypt.AgreeTOS (Just "me@my.domain"))

That's about as simple a configuration as I can imagine for such a website!


The two parts of letsencrypt that are complicated are not the fault of the client really. Those are renewal and rate limiting.

I'm currently rate limited for the next week because I asked letsencrypt for several certificates for a domain, as I was learning how to use it and integrating it into propellor. So I've not quite managed to fully test everything. That's annoying. I also worry that rate limiting could hit at an inopportune time once I'm relying on letsencrypt. It's especially problimatic that it only allows 5 certs for subdomains of a given domain per week. What if I use a lot of subdomains?

Renewal is complicated mostly because there's no good way to test it. You set up your cron job, or whatever, and wait three months, and hopefully it worked. Just as likely, you got something wrong, and your website breaks. Maybe letsencrypt could offer certificates that will only last an hour, or a day, for use when testing renewal.

Also, what if something goes wrong with renewal? Perhaps letsencrypt.org is not available when your certificate needs to be renewed.

What I've done in propellor to handle renewal is, it runs letsencrypt every time, with the --keep-until-expiring option. If this fails, propellor will report a failure. As long as propellor is run periodically by a cron job, this should result in multiple failure reports being sent (for 30 days I think) before a cert expires without getting renewed. But, I have not been able to test this.

Posted
propelling disk images

Following up on Then and Now ...

In quiet moments at ICFP last August, I finished teaching Propellor to generate disk images. With an emphasis on doing a whole lot with very little new code and extreme amount of code reuse.

For example, let's make a disk image with nethack on it. First, we need to define a chroot. Disk image creation reuses propellor's chroot support, described back in propelling containers. Any propellor properties can be assigned to the chroot, so it's easy to describe the system we want.

 nethackChroot :: FilePath -> Chroot
    nethackChroot d = Chroot.debootstrapped (System (Debian Stable) "amd64") mempty d
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["nethack-console"]
        & accountFor gamer
        & gamer `hasInsecurePassword` "hello"
        & gamer `hasLoginShell` "/usr/games/nethack"
      where gamer = User "gamer"

Now to make an image from that chroot, we just have to tell propellor where to put the image file, some partitioning information, and to make it boot using grub.

 nethackImage :: RevertableProperty
    nethackImage = imageBuilt "/srv/images/nethack.img" nethackChroot
        MSDOS (grubBooted PC)
        [ partition EXT2 `mountedAt` "/boot"
            `setFlag` BootFlag
        , partition EXT4 `mountedAt` "/"
            `addFreeSpace` MegaBytes 100
        , swapPartition (MegaBytes 256)
        ]

The disk image partitions default to being sized to fit exactly the files from the chroot that go into each partition, so, the disk image is as small as possible by default. There's a little DSL to configure the partitions. To give control over the partition size, it has some functions, like addFreeSpace and setSize. Other functions like setFlag and extended can further adjust the partitions. I think that worked out rather well; the partition specification is compact and avoids unecessary hardcoded sizes, while providing plenty of control.

By the end of ICFP, I had Propellor building complete disk images, but no boot loader installed on them.


Fast forward to today. After stuggling with some strange grub behavior, I found a working method to install grub onto a disk image.

The whole disk image feature weighs in at:

203 lines to interface with parted
88 lines to format and mount partitions
90 lines for the partition table specification DSL and partition sizing
196 lines to generate disk images
75 lines to install grub on a disk image
652 lines of code total

Which is about half the size of vmdebootstrap 1/4th the size of partman-base (probably 1/100th the size of total partman), and 1/13th the size of live-build. All of which do similar things, in ways that seem to me to be much less flexible than Propellor.


One thing I'm considering doing is extending this so Propellor can use qemu-user-static to create disk images for eg, arm. Add some u-boot setup, and this could create bootable images for arm boards. A library of configs for various arm boards could then be included in Propellor. This would be a lot easier than running the Debian Installer on an arm board.

Oh! I only just now realized that if you have a propellor host configured, like this example for my dialup gateway, leech --

 leech = host "leech.kitenet.net"
        & os (System (Debian (Stable "jessie")) "armel")
        & Apt.installed ["linux-image-kirkwood", "ppp", "screen", "iftop"]
        & privContent "/etc/ppp/peers/provider"
        & privContent "/etc/ppp/pap-secrets"
        & Ppp.onBoot
        & hasPassword (User "root")
        & Ssh.installed

-- The host's properties can be extracted from it, using eg hostProperties leech and reused to create a disk image with the same properties as the host!

So, when my dialup gateway gets struck by lightning again, I could use this to build a disk image for its replacement:

 import qualified Propellor.Property.Hardware.SheevaPlug as SheevaPlug

    laptop = host "darkstar.kitenet.net"
        & SheevaPlug.diskImage "/srv/images/leech.img" (MegaBytes 2000)
            (& propertyList "has all of leech's properties"
                (hostProperties leech))

This also means you can start with a manually built system, write down the properties it has, and iteratively run Propellor against it until you think you have a full specification of it, and then use that to generate a new, clean disk image. Nice way to transition from sysadmin days of yore to a clean declaratively specified system.

Posted
propellor orchestration

With the disclamer that I don't really know much about orchestration, I have added support for something resembling it to Propellor.

Until now, when using propellor to manage a bunch of hosts, you updated them one at a time by running propellor --spin $somehost, or maybe you set up a central git repository, and a cron job to run propellor on each host, pulling changes from git.

I like both of these ways to use propellor, but they only go so far...

  • Perhaps you have a lot of hosts, and would like to run propellor on them all concurrently.

      master = host "master.example.com"
          & concurrently conducts alotofhosts
    
  • Perhaps you want to run propellor on your dns server last, so when you add a new webserver host, it gets set up and working before the dns is updated to point to it.

      master = host "master.example.com"
          & conducts webservers
              `before` conducts dnsserver
    
  • Perhaps you have something more complex, with multiple subnets that propellor can run in concurrently, finishing up by updating that dnsserver.

      master = host "master.example.com"
          & concurrently conducts [sub1, sub2]
              `before` conducts dnsserver
    
      sub1 = "master.subnet1.example.com"
          & concurrently conducts webservers
          & conducts loadbalancers
    
      sub2 = "master.subnet2.example.com"
          & conducts dockerservers
    
  • Perhaps you need to first run some command that creates a VPS host, and then want to run propellor on that host to set it up.

      vpscreate h = cmdProperty "vpscreate" [hostName h]
          `before` conducts h
    

All those scenarios are supported by propellor now!

Well, I haven't actually implemented concurrently yet, but the point is that the conducts property can be used with any of propellor's property combinators, like before etc, to express all kinds of scenarios.

The conducts property works in combination with an orchestrate function to set up all the necessary stuff to let one host ssh into another and run propellor there.

main = defaultMain (orchestrate hosts)

hosts = 
    [ master
    , webservers 
    , ...
    ]

The orchestrate function does a bunch of stuff:

  • Builds up a graph of what conducts what.
  • Removes any cycles that might have snuck in by accident, before they cause foot shooting.
  • Arranges for the ssh keys to be accepted as necessary.
    Note that you you need to add ssh key properties to all relevant hosts so it knows what keys to trust.
  • Arranges for the private data of a host to be provided to the hosts that conduct it, so they can pass it along.

I've very pleased that I was able to add the Propellor.Property.Conductor module implementing this with only a tiny change to the rest of propellor. Almost everything needed to implement it was there in propellor's infrastructure already.

Also kind of cool that it only needed 13 lines of imperative code, the other several hundred lines of the implementation being all pure code.

Posted
it's a bird, it's a plane, it's a super monoid for propellor

I've been doing a little bit of dynamically typed programming in Haskell, to improve Propellor's Info type. The result is kind of interesting in a scary way.

Info started out as a big record type, containing all the different sorts of metadata that Propellor needed to keep track of. Host IP addresses, DNS entries, ssh public keys, docker image configuration parameters... This got quite out of hand. Info needed to have its hands in everything, even types that should have been private to their module.

To fix that, recent versions of Propellor let a single Info contain many different types of values. Look at it one way and it contains DNS entries; look at it another way and it contains ssh public keys, etc.

As an émigré from lands where you can never know what type of value is in a $foo until you look, this was a scary prospect at first, but I found it's possible to have the benefits of dynamic types and the safety of static types too.

The key to doing it is Data.Dynamic. Thanks to Joachim Breitner for suggesting I could use it here. What I arrived at is this type (slightly simplified):

newtype Info = Info [Dynamic]
    deriving (Monoid)

So Info is a monoid, and it holds of a bunch of dynamic values, which could each be of any type at all. Eep!

So far, this is utterly scary to me. To tame it, the Info constructor is not exported, and so the only way to create an Info is to start with mempty and use this function:

addInfo :: (IsInfo v, Monoid v) => Info -> v -> Info
addInfo (Info l) v = Info (toDyn v : l)

The important part of that is that only allows adding values that are in the IsInfo type class. That prevents the foot shooting associated with dynamic types, by only allowing use of types that make sense as Info. Otherwise arbitrary Strings etc could be passed to addInfo by accident, and all get concated together, and that would be a total dynamic programming mess.

Anything you can add into an Info, you can get back out:

getInfo :: (IsInfo v, Monoid v) => Info -> v
getInfo (Info l) = mconcat (mapMaybe fromDynamic (reverse l))

Only monoids can be stored in Info, so if you ask for a type that an Info doesn't contain, you'll get back mempty.

Crucially, IsInfo is an open type class. Any module in Propellor can make a new data type and make it an instance of IsInfo, and then that new data type can be stored in the Info of a Property, and any Host that uses the Property will have that added to its Info, available for later introspection.


For example, this weekend I'm extending Propellor to have controllers: Hosts that are responsible for running Propellor on some other hosts. Useful if you want to run propellor once and have it update the configuration of an entire network of hosts.

There can be whole chains of controllers controlling other controllers etc. The problem is, what if host foo has the property controllerFor bar and host bar has the property controllerFor foo? I want to avoid a loop of foo running Propellor on bar, running Propellor on foo, ...

To detect such loops, each Host's Info should contain a list of the Hosts it's controlling. Which is not hard to accomplish:

newtype Controlling = Controlled [Host]
    deriving (Typeable, Monoid)

isControlledBy :: Host -> Controlling -> Bool
h `isControlledBy` (Controlled hs) = any (== hostName h) (map hostName hs)

instance IsInfo Controlling where
    propigateInfo _ = True

mkControllingInfo :: Host -> Info
mkControllingInfo controlled = addInfo mempty (Controlled [controlled])

getControlledBy :: Host -> Controlling
getControlledBy = getInfo . hostInfo

isControllerLoop :: Host -> Host -> Bool
isControllerLoop controller controlled = go S.empty controlled
  where
    go checked h
        | controller `isControlledBy` c = True
        -- avoid checking loops that have been checked before
        | hostName h `S.member` checked = False
        | otherwise = any (go (S.insert (hostName h) checked)) l
      where
        c@(Controlled l) = getControlledBy h

This is all internal to the module that needs it; the rest of propellor doesn't need to know that the Info is using used for this. And yet, the necessary information about Hosts is gathered as propellor runs.


So, that's a useful technique. I do wonder if I could somehow make addInfo combine together values in the list that have the same type; as it is the list can get long. And, to show Info, the best I could do was this:

 instance Show Info where
            show (Info l) = "Info " ++ show (map dynTypeRep l)

The resulting long list of the types of vales stored in a host's info is not a useful as it could be. Of course, getInfo can be used to get any particular type of value:

*Main> hostInfo kite
Info [InfoVal System,PrivInfo,PrivInfo,Controlling,DnsInfo,DnsInfo,DnsInfo,AliasesInfo, ...
*Main> getInfo (hostInfo kite) :: AliasesInfo
AliasesInfo (fromList ["downloads.kitenet.net","git.joeyh.name","imap.kitenet.net","nntp.olduse.net" ...

And finally, I keep trying to think of a better name than "Info".

then and now

It's 2004 and I'm in Oldenburg DE, working on the Debian Installer. Colin and I pair program on partman, its new partitioner, to get it into shape. We've somewhat reluctantly decided to use it. Partman is in some ways a beautful piece of work, a mass of semi-object-oriented, super extensible shell code that sprang fully formed from the brow of Anton. And in many ways, it's mad, full of sector alignment twiddling math implemented in tens of thousands of lines of shell script scattered amoung hundreds of tiny files that are impossible to keep straight. In the tiny Oldenburg Developers Meeting, full of obscure hardware and crazy intensity of ideas like porting Debian to VAXen, we hack late into the night, night after night, and crash on the floor.

sepia toned hackers round a table

It's 2015 and I'm at a Chinese bakery, then at the Berkeley pier, then in a SF food truck lot, catching half an hour here and there in my vacation to add some features to Propellor. Mostly writing down data types for things like filesystem formats, partition layouts, and then some small amount of haskell code to use them in generic ways. Putting these peices together and reusing stuff already in Propellor (like chroot creation).

Before long I have this, which is only 2 undefined functions away from (probably) working:

let chroot d = Chroot.debootstrapped (System (Debian Unstable) "amd64") mempty d
        & Apt.installed ["openssh-server"]
        & ...
    partitions = fitChrootSize MSDOS
        [ (Just "/boot", mkPartiton EXT2)
        , (Just "/", mkPartition EXT4)
        , (Nothing, const (mkPartition LinuxSwap (MegaBytes 256)))
        ]
 in Diskimage.built chroot partitions (grubBooted PC)

This is at least a replication of vmdebootstrap, generating a bootable disk image from that config and 400 lines of code, with enormous customizability of the disk image contents, using all the abilities of Propellor. But is also, effectively, a replication of everything partman is used for (aside from UI and RAID/LVM).

sailboat on the SF bay

What a difference a decade and better choices of architecture make! In many ways, this is the loosely coupled, extensible, highly configurable system partman aspired to be. Plus elegance. And I'm writing it on a lark, because I have some spare half hours in my vacation.

Past Debian Installer team lead Tollef stops by for lunch, I show him the code, and we have the conversation old d-i developers always have about partman.

I can't say that partman was a failure, because it's been used by millions to install Debian and Ubuntu and etc for a decade. Anything that deletes that many Windows partitions is a success. But it's been an unhappy success. Nobody has ever had a good time writing partman recipes; the code has grown duplication and unmaintainability.

I can't say that these extensions to Propellor will be a success; there's no plan here to replace Debian Installer (although with a few hundred more lines of code, propellor is d-i 2.0); indeed I'm just adding generic useful stuff and building further stuff out of it without any particular end goal. Perhaps that's the real difference.

Posted
making propellor safer with GADTs and type families

Since July, I have been aware of an ugly problem with propellor. Certain propellor configurations could have a bug. I've tried to solve the problem at least a half-dozen times without success; it's eaten several weekends.

Today I finally managed to fix propellor so it's impossible to write code that has the bug, bending the Haskell type checker to my will with the power of GADTs and type-level functions.

the bug

Code with the bug looked innocuous enough. Something like this:

foo :: Property
foo = property "foo" $
    unlessM (liftIO $ doesFileExist "/etc/foo") $ do
        bar <- liftIO $ readFile "/etc/foo.template"
        ensureProperty $ setupFoo bar

The problem comes about because some properties in propellor have Info associated with them. This is used by propellor to introspect over the properties of a host, and do things like set up DNS, or decrypt private data used by the property.

At the same time, it's useful to let a Property internally decide to run some other Property. In the example above, that's the ensureProperty line, and the setupFoo Property is run only sometimes, and is passed data that is read from the filesystem.

This makes it very hard, indeed probably impossible for Propellor to look inside the monad, realize that setupFoo is being used, and add its Info to the host.

Probably, setupFoo doesn't have Info associated with it -- most properties do not. But, it's hard to tell, when writing such a Property if it's safe to use ensureProperty. And worse, setupFoo could later be changed to have Info.

Now, in most languages, once this problem was noticed, the solution would probably be to make ensureProperty notice when it's called on a Property that has Info, and print a warning message. That's Good Enough in a sense.

But it also really stinks as a solution. It means that building propellor isn't good enough to know you have a working system; you have to let it run on each host, and watch out for warnings. Ugh, no!

the solution

This screams for GADTs. (Well, it did once I learned how what GADTs are and what they can do.)

With GADTs, Property NoInfo and Property HasInfo can be separate data types. Most functions will work on either type (Property i) but ensureProperty can be limited to only accept a Property NoInfo.

data Property i where
    IProperty :: Desc -> ... -> Info -> Property HasInfo
    SProperty :: Desc -> ... -> Property NoInfo

data HasInfo
data NoInfo

ensureProperty :: Property NoInfo -> Propellor Result

Then the type checker can detect the bug, and refuse to compile it.

Yay!

Except ...

Property combinators

There are a lot of Property combinators in propellor. These combine two or more properties in various ways. The most basic one is requires, which only runs the first Property after the second one has successfully been met.

So, what's it's type when used with GADT Property?

requires :: Property i1 -> Property i2 -> Property ???

It seemed I needed some kind of type class, to vary the return type.

class Combine x y r where
    requires :: x -> y -> r

Now I was able to write 4 instances of Combines, for each combination of 2 Properties with HasInfo or NoInfo.

It type checked. But, type inference was busted. A simple expression like

foo `requires` bar

blew up:

   No instance for (Requires (Property HasInfo) (Property HasInfo) r0)
      arising from a use of `requires'
    The type variable `r0' is ambiguous
    Possible fix: add a type signature that fixes these type variable(s)
    Note: there is a potential instance available:
      instance Requires
                 (Property HasInfo) (Property HasInfo) (Property HasInfo)
        -- Defined at Propellor/Types.hs:167:10

To avoid that, it needed ":: Property HasInfo" appended -- I didn't want the user to need to write that.

I got stuck here for an long time, well over a month.

type level programming

Finally today I realized that I could fix this with a little type-level programming.

class Combine x y where
    requires :: x -> y -> CombinedType x y

Here CombinedType is a type-level function, that calculates the type that should be used for a combination of types x and y. This turns out to be really easy to do, once you get your head around type level functions.

type family CInfo x y
type instance CInfo HasInfo HasInfo = HasInfo
type instance CInfo HasInfo NoInfo = HasInfo
type instance CInfo NoInfo HasInfo = HasInfo
type instance CInfo NoInfo NoInfo = NoInfo
type family CombinedType x y
type instance CombinedType (Property x) (Property y) = Property (CInfo x y)

And, with that change, type inference worked again! \o/

(Bonus: I added some more intances of CombinedType for combining things like RevertableProperties, so propellor's property combinators got more powerful too.)

Then I just had to make a massive pass over all of Propellor, fixing the types of each Property to be Property NoInfo or Property HasInfo. I frequently picked the wrong one, but the type checker was able to detect and tell me when I did.

A few of the type signatures got slightly complicated, to provide the type checker with sufficient proof to do its thing...

before :: (IsProp x, Combines y x, IsProp (CombinedType y x)) => x -> y -> CombinedType y x
before x y = (y `requires` x) `describe` (propertyDesc x)

onChange
    :: (Combines (Property x) (Property y))
    => Property x
    => Property y
    => CombinedType (Property x) (Property y)
onChange = -- 6 lines of code omitted

fallback :: (Combines (Property p1) (Property p2)) => Property p1 -> Property p2 -> Property (CInfo p1 p2)
fallback = -- 4 lines of code omitted

.. This mostly happened in property combinators, which is an acceptable tradeoff, when you consider that the type checker is now being used to prove that propellor can't have this bug.

Mostly, things went just fine. The only other annoying thing was that some things use a [Property], and since a haskell list can only contain a single type, while Property Info and Property NoInfo are two different types, that needed to be dealt with. Happily, I was able to extend propellor's existing (&) and (!) operators to work in this situation, so a list can be constructed of properties of several different types:

propertyList "foos" $ props
    & foo
    & foobar
    ! oldfoo    

conclusion

The resulting 4000 lines of changes will be in the next release of propellor. Just as soon as I test that it always generates the same Info as before, and perhaps works when I run it. (eep)

These uses of GADTs and type families are not new; this is merely the first time I used them. It's another Haskell leveling up for me.

Anytime you can identify a class of bugs that can impact a complicated code base, and rework the code base to completely avoid that class of bugs, is a time to celebrate!

clean OS reinstalls with propellor

You have a machine someplace, probably in The Cloud, and it has Linux installed, but not to your liking. You want to do a clean reinstall, maybe switching the distribution, or getting rid of the cruft. But this requires running an installer, and it's too difficult to run d-i on remote machines.

Wouldn't it be nice if you could point a program at that machine and have it do a reinstall, on the fly, while the machine was running?

This is what I've now taught propellor to do! Here's a working configuration which will make propellor convert a system running Fedora (or probably many other Linux distros) to Debian:

testvm :: Host
testvm = host "testvm.kitenet.net"
        & os (System (Debian Unstable) "amd64")
        & OS.cleanInstallOnce (OS.Confirmed "testvm.kitenet.net")
                `onChange` propertyList "fixing up after clean install"
                        [ User.shadowConfig True
                        , OS.preserveRootSshAuthorized
                        , OS.preserveResolvConf
                        , Apt.update
                        , Grub.boots "/dev/sda"
                                `requires` Grub.installed Grub.PC
                        ]
        & Hostname.sane
        & Hostname.searchDomain
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["ssh"]
        & User.hasSomePassword "root"

And here's a video of it in action.


It was surprisingly easy to build this. Propellor already knew how to create a chroot, so from there it basically just has to move files around until the chroot takes over from the old OS.

After the cleanInstallOnce property does its thing, propellor is running inside a freshly debootstrapped Debian system. Then we just need a few more Propertites to get from there to a bootable, usable system: Install grub and the kernel, turn on shadow passwords, preserve a few config files from the old OS, etc.

It's really astounding to me how much easier this was to build than it was to build d-i. It took years to get d-i to the point of being able to install a working system. It took me a few part days to add this capability to propellor (It's 200 lines of code), and I've probably spent a total of less than 30 days total developing propellor in its entirity.

So, what gives? Why is this so much easier? There are a lot of reasons:

  • Technology is so much better now. I can spin up cloud VMs for testing in seconds; I use VirtualBox to restore a system from a snapshot. So testing is much much easier. The first work on d-i was done by booting real machines, and for a while I was booting them using floppies.

  • Propellor doesn't have a user interface. The best part of d-i is preseeding, but that was mostly an accident; when I started developing d-i the first thing I wrote was main-menu (which is invisible 99.9% of the time) and we had to develop cdebconf, and tons of other UI. Probably 90% of d-i work involves the UI. Jettisoning the UI entirely thus speeds up development enormously. And propellor's configuration file blows d-i preseeding out of the water in expressiveness and flexability.

  • Propellor has a much more principled design and implementation. Separating things into Properties, which are composable and reusable gives enormous leverage. Strong type checking and a powerful programming language make it much easier to develop than d-i's mess of shell scripts calling underpowered busybox commands etc. Properties often Just Work the first time they're tested.

  • No separate runtime. d-i runs in its own environment, which is really a little custom linux distribution. Developing linux distributions is hard. Propellor drops into a live system and runs there. So I don't need to worry about booting up the system, getting it on the network, etc etc. This probably removes another order of magnitude of complexity from propellor as compared with d-i.

This seems like the opposite of the Second System effect to me. So perhaps d-i was the second system all along?

I don't know if I'm going to take this all the way to propellor is d-i 2.0. But in theory, all that's needed now is:

  • Teaching propellor how to build a bootable image, containing a live Debian system and propellor. (Yes, this would mean reimplementing debian-live, but I estimate 100 lines of code to do it in propellor; most of the Properties needed already exist.) That image would then be booted up and perform the installation.
  • Some kind of UI that generates the propellor config file.
  • Adding Properties to partition the disk.

cleanInstallOnce and associated Properties will be included in propellor's upcoming 1.1.0 release, and are available in git now.

Oh BTW, you could parameterize a few Properties by OS, and Propellor could be used to install not just Debian or Ubuntu, but whatever Linux distribution you want. Patches welcomed...

propelling containers

Propellor has supported docker containers for a "long" time, and it works great. This week I've worked on adding more container support.

docker containers (revisited)

The syntax for docker containers has changed slightly. Here's how it looks now:

example :: Host
example = host "example.com"
    & Docker.docked webserverContainer

webserverContainer :: Docker.Container
webserverContainer = Docker.container "webserver" (Docker.latestImage "joeyh/debian-stable")
    & os (System (Debian (Stable "wheezy")) "amd64")
    & Docker.publish "80:80"
    & Apt.serviceInstalledRunning "apache2"
    & alias "www.example.com"

That makes example.com have a web server in a docker container, as you'd expect, and when propellor is used to deploy the DNS server it'll automatically make www.example.com point to the host (or hosts!) where this container is docked.

I use docker a lot, but I have drank little of the Docker KoolAid. I'm not keen on using random blobs created by random third parties using either unreproducible methods, or the weirdly underpowered dockerfiles. (As for vast complicated collections of containers that each run one program and talk to one another etc ... I'll wait and see.)

That's why propellor runs inside the docker container and deploys whatever configuration I tell it to, in a way that's both replicatable later and lets me use the full power of Haskell.

Which turns out to be useful when moving on from docker containers to something else...

systemd-nspawn containers

Propellor now supports containers using systemd-nspawn. It looks a lot like the docker example.

example :: Host
example = host "example.com"
    & Systemd.persistentJournal
    & Systemd.nspawned webserverContainer

webserverContainer :: Systemd.Container
webserverContainer = Systemd.container "webserver" chroot
    & Apt.serviceInstalledRunning "apache2"
    & alias "www.example.com"
  where
    chroot = Chroot.debootstrapped (System (Debian Unstable) "amd64") Debootstrap.MinBase

Notice how I specified the Debian Unstable chroot that forms the basis of this container. Propellor sets up the container by running debootstrap, boots it up using systemd-nspawn, and then runs inside the container to provision it.

Unlike docker containers, systemd-nspawn containers use systemd as their init, and it all integrates rather beautifully. You can see the container listed in systemctl status, including the services running inside it, use journalctl to examine its logs, etc.

But no, systemd is the devil, and docker is too trendy...

chroots

Propellor now also supports deploying good old chroots. It looks a lot like the other containers. Rather than repeat myself a third time, and because we don't really run webservers inside chroots much, here's a slightly different example.

example :: Host
example = host "mylaptop"
    & Chroot.provisioned (buildDepChroot "git-annex")

buildDepChroot :: Apt.Package -> Chroot.Chroot
buildDepChroot pkg = Chroot.debootstrapped system Debootstrap.BuildD dir
    & Apt.buildDep pkg
  where
    dir = /srv/chroot/builddep/"++pkg
   system = System (Debian Unstable) "amd64"

Again this uses debootstrap to build the chroot, and then it runs propellor inside the chroot to provision it (btw without bothering to install propellor there, thanks to the magic of bind mounts and completely linux distribution-independent packaging).

In fact, the systemd-nspawn container code reuses the chroot code, and so turns out to be really rather simple. 132 lines for the chroot support, and 167 lines for the systemd support (which goes somewhat beyond the nspawn containers shown above).

Which leads to the hardest part of all this...

debootstrap

Making a propellor property for debootstrap should be easy. And it was, for Debian systems. However, I have crazy plans that involve running propellor on non-Debian systems, to debootstrap something, and installing debootstrap on an arbitrary linux system is ... too hard.

In the end, I needed 253 lines of code to do it, which is barely one magnitude less code than the size of debootstrap itself. I won't go into the ugly details, but this could be made a lot easier if debootstrap catered more to being used outside of Debian.

closing

Docker and systemd-nspawn have different strengths and weaknesses, and there are sure to be more container systems to come. I'm pleased that Propellor can add support for a new container system in a few hundred lines of code, and that it abstracts away all the unimportant differences between these systems.

PS

Seems likely that systemd-nspawn containers can be nested to any depth. So, here's a new kind of fork bomb!

infinitelyNestedContainer :: Systemd.Container
infinitelyNestedContainer = Systemd.container "evil-systemd"
    (Chroot.debootstrapped (System (Debian Unstable) "amd64") Debootstrap.MinBase)
    & Systemd.nspawned infinitelyNestedContainer

Strongly typed purely functional container deployment can only protect us against a certian subset of all badly thought out systems. ;)


Note that the above was written in 2014 and some syntatix details have changed. See the documentation for Propellor.Property.Chroot, Propellor.Property.Debootstrap, Propellor.Property.Docker, Propellor.Property.Systemd for current examples.

using a debian package as the remote for a local config repo

Today I did something interesting with the Debian packaging for propellor, which seems like it could be a useful technique for other Debian packages as well.

Propellor is configured by a directory, which is maintained as a local git repository. In propellor's case, it's ~/.propellor/. This contains a lot of haskell files, in fact the entire source code of propellor! That's really unusual, but I think this can be generalized to any package whose configuration is maintained in its own git repository on the user's system. For now on, I'll refer to this as the config repo.

The config repo is set up the first time a user runs propellor. But, until now, I didn't provide an easy way to update the config repo when the propellor package was updated. Nothing would break, but the old version would be used until the user updated it themselves somehow (probably by pulling from a git repository over the network, bypassing apt's signature validation).

So, what I wanted was a way to update the config repo, merging in any changes from the new version of the Debian package, while preserving the user's local modifications. Ideally, the user could just run git merge upstream/master, where the upstream repo was included in the Debian package.

But, that can't work! The Debian package can't reasonably include the full git repository of propellor with all its history. So, any git repository included in the Debian binary package would need to be a synthetic one, that only contains probably one commit that is not connected to anything else. Which means that if the config repo was cloned from that repo in version 1.0, then when version 1.1 came around, git would see no common parent when merging 1.1 into the config repo, and the merge would fail horribly.

To solve this, let's assume that the config repo's master branch has a parent commit that can be identified, somehow, as coming from a past version of the Debian package. It doesn't matter which version, although the last one merged with will be best. (The easy way to do this is to set refs/heads/upstream/master to point to it when creating the config repo.)

Once we have that parent commit, we have three things:

  1. The current content of the config repo.
  2. The content from some old version of the Debian package.
  3. The new content of the Debian package.

Now git can be used to merge #3 onto #2, with -Xtheirs, so the result is a git commit with parents of #3 and #2, and content of #3. (This can be done using a temporary clone of the config repo to avoid touching its contents.)

Such a git commit can be merged into the config repo, without any conflicts other than those the user might have caused with their own edits.

So, propellor will tell the user when updates are available, and they can simply run git merge upstream/master to get them. The resulting history looks like this:

* Merge remote-tracking branch 'upstream/master'
|\  
| * merging upstream version
| |\  
| | * upstream version
* | user change
|/  
* upstream version

So, generalizing this, if a package has a lot of config files, and creates a git repository containing them when the user uses it (or automatically when it's installed), this method can be used to provide an easily mergable branch that tracks the files as distributed with the package.

It would perhaps not be hard to get from here to a full git-backed version of ucf. Note that the Debian binary package doesn't have to ship a git repisitory, it can just as easily ship the current version of the config files somewhere in /usr, and check them into a new empty repository as part of the generation of the upstream/master branch.

Posted
how I wrote init by accident

I wrote my own init. I didn't mean to, and in the end, it took 2 lines of code. Here's how.

Propellor has the nice feature of supporting provisioning of Docker containers. Since Docker normally runs just one command inside the container, I made the command that docker runs be propellor, which runs inside the container and takes care of provisioning it according to its configuration.

For example, here's a real live configuration of a container:

        -- Exhibit: kite's 90's website.
        , standardContainer "ancient-kitenet" Stable "amd64"
                & Docker.publish "1994:80"
                & Apt.serviceInstalledRunning "apache2"
                & Git.cloned "root" "git://kitenet-net.branchable.com/" "/var/www"
                        (Just "remotes/origin/old-kitenet.net")

When propellor is run inside this container, it takes care of installing apache, and since the property states apache should be running, it also starts the daemon if necessary.

At boot, docker remembers the command that was used to start the container last time, and runs it again. This time, apache is already installed, so propellor simply starts the daemon.

This was surprising, but it was just what I wanted too! The only missing bit to make this otherwise entirely free implementation of init work properly was two lines of code:

                void $ async $ job reapzombies
  where
        reapzombies = void $ getAnyProcessStatus True False

Propellor-as-init also starts up a simple equalivilant of rsh on a named pipe (for communication between the propellor inside and outside the container), and also runs a root login shell (so the user can attach to the container and administer it). Also, running a compiled program from the host system inside a container, which might use a different distribution or architecture was an interesting challenge (solved using the method described in completely linux distribution-independent packaging). So it wasn't entirely trivial, but as far as init goes, it's probably one of the simpler implementations out there.

I know that there are various other solutions on the space of an init for Docker -- personally I'd rather the host's systemd integrated with it so I could see the status of the container's daemons in systemctl status. If that does happen, perhaps I'll eventually be able to remove 2 lines of code from propellor.

propellor-driven DNS and backups

Took a while to get here, but Propellor 0.4.0 can deploy DNS servers and I just had it deploy mine. Including generating DNS zone files.

Configuration is dead simple, as far as DNS goes:

     & alias "ns1.example.com"
        & Dns.secondary hosts "joeyh.name"
                & Dns.primary hosts "example.com"
                        (Dns.mkSOA "ns1.example.com" 100)
                        [ (RootDomain, NS $ AbsDomain "ns1.example.com")
            , (RootDomain, NS $ AbsDomain "ns2.example.com")
                        ]

The awesome thing is that propellor fills in all the other information in the zone file by looking at the properties of the hosts it knows about.

 , host "blue.example.com"
        & ipv4 "192.168.1.1"
        & ipv6 "fe80::26fd:52ff:feea:2294"

        & alias "example.com"
        & alias "www.example.com"
        & alias "example.museum"
        & Docker.docked hosts "webserver"
            `requres` backedup "/var/www"
        
        & alias "ns2.example.com"
        & Dns.secondary hosts "example.com"

When it sees this host, Propellor adds its IP addresses to the example.com DNS zone file, for both its main hostname ("blue.example.com"), and also its relevant aliases. (The .museum alias would go into a different zone file.)

Multiple hosts can define the same alias, and then you automaticlly get round-robin DNS.

The web server part of of the blue.example.com config can be cut and pasted to another host in order to move its web server to the other host, including updating the DNS. That's really all there is to is, just cut, paste, and commit!

I'm quite happy with how that worked out. And curious if Puppet etc have anything similar.


One tricky part of this was how to ensure that the serial number automtically updates when changes are made. The way this is handled is Propellor starts with a base serial number (100 in the example above), and then it adds to it the number of commits in its git repository. The zone file is only updated when something in it besides the serial number needs to change.

The result is nice small serial numbers that don't risk overflowing the (so 90's) 32 bit limit, and will be consistent even if the configuration had Propellor setting up multiple independent master DNS servers for the same domain.


Another recent feature in Propellor is that it can use Obnam to back up a directory. With the awesome feature that if the backed up directory is empty/missing, Propellor will automcatically restore it from the backup.

Here's how the backedup property used in the example above might be implemented:

backedup :: FilePath -> Property
backedup dir = Obnam.backup dir daily
    [ "--repository=sftp://rsync.example.com/~/webserver.obnam"
    ] Obnam.OnlyClient
    `requires` Ssh.keyImported SshRsa "root"
    `requires` Ssh.knownHost hosts "rsync.example.com" "root"
    `requires` Gpg.keyImported "1B169BE1" "root"

Notice that the Ssh.knownHost makes root trust the ssh host key belonging to rsync.example.com. So Propellor needs to be told what that host key is, like so:

 , host "rsync.example.com"
        & ipv4 "192.168.1.4"
        & sshPubKey "ssh-rsa blahblahblah"

Which of course ties back into the DNS and gets this hostname set in it. But also, the ssh public key is available for this host and visible to the DNS zone file generator, and that could also be set in the DNS, in a SSHFP record. I haven't gotten around to implementing that, but hope at some point to make Propellor support DNSSEC, and then this will all combine even more nicely.


By the way, Propellor is now up to 3 thousand lines of code (not including Utility library). In 20 days, as a 10% time side project.

Posted
propellor introspection for DNS

In just released Propellor 0.3.0, I've improved improved Propellor's config file DSL significantly. Now properties can set attributes of a host, that can be looked up by its other properties, using a Reader monad.

This saves needing to repeat yourself:

hosts = [ host "orca.kitenet.net"
        & stdSourcesList Unstable
        & Hostname.sane -- uses hostname from above

And it simplifies docker setup, with no longer a need to differentiate between properties that configure docker vs properties of the container:

 -- A generic webserver in a Docker container.
    , Docker.container "webserver" "joeyh/debian-unstable"
        & Docker.publish "80:80"
        & Docker.volume "/var/www:/var/www"
        & Apt.serviceInstalledRunning "apache2"

But the really useful thing is, it allows automating DNS zone file creation, using attributes of hosts that are set and used alongside their other properties:

hosts =
    [ host "clam.kitenet.net"
        & ipv4 "10.1.1.1"

        & cname "openid.kitenet.net"
        & Docker.docked hosts "openid-provider"

        & cname "ancient.kitenet.net"
        & Docker.docked hosts "ancient-kitenet"
    , host "diatom.kitenet.net"
        & Dns.primary "kitenet.net" hosts
    ]

Notice that hosts is passed into Dns.primary, inside the definition of hosts! Tying the knot like this is a fun haskell laziness trick. :)

Now I just need to write a little function to look over the hosts and generate a zone file from their hostname, cname, and address attributes:

extractZoneFile :: Domain -> [Host] -> ZoneFile
extractZoneFile = gen . map hostAttr
  where gen = -- TODO

The eventual plan is that the cname property won't be defined as a property of the host, but of the container running inside it. Then I'll be able to cut-n-paste move docker containers between hosts, or duplicate the same container onto several hosts to deal with load, and propellor will provision them, and update the zone file appropriately.


Also, Chris Webber had suggested that Propellor be able to separate values from properties, so that eg, a web wizard could configure the values easily. I think this gets it much of the way there. All that's left to do is two easy functions:

overrideAttrsFromJSON :: Host -> JSON -> Host

exportJSONAttrs :: Host -> JSON

With these, propellor's configuration could be adjusted at run time using JSON from a file or other source. For example, here's a containerized webserver that publishes a directory from the external host, as configured by JSON that it exports:

demo :: Host
demo = Docker.container "webserver" "joeyh/debian-unstable"
    & Docker.publish "80:80"
    & dir_to_publish "/home/mywebsite" -- dummy default
    & Docker.volume (getAttr dir_to_publish ++":/var/www")
    & Apt.serviceInstalledRunning "apache2"

main = do
    json <- readJSON "my.json"
    let demo' = overrideAttrsFromJSON demo
    writeJSON "my.json" (exportJSONAttrs demo')
    defaultMain [demo']
propellor type-safe reversions

Propellor ensures that a list of properties about a system are satisfied. But requirements change, and so you might want to revert a property that had been set up before.

For example, I had a system with a webserver container:

Docker.docked container hostname "webserver"

I don't want a web server there any more. Rather than having a separate property to stop it, wouldn't it be nice to be able to say:

revert (Docker.docked container hostname "webserver")

I've now gotten this working. The really fun part is, some properies support reversion, but other properties certianly do not. Maybe the code to revert them is not worth writing, or maybe the property does something that cannot be reverted.

For example, Docker.garbageCollected is a property that makes sure there are no unused docker images wasting disk space. It can't be reverted. Nor can my personal standardSystem Unstable property, which amoung other things upgrades the system to unstable and sets up my home directory..

I found a way to make Propellor statically check if a property can be reverted at compile time. So revert Docker.garbageCollected will fail to type check!

The tricky part about implementing this is that the user configures Propellor with a list of properties. But now there are two distinct types of properties, revertable ones and non-revertable ones. And Haskell does not support heterogeneous lists..

My solution to this is a typeclass and some syntactic sugar operators. To build a list of properties, with individual elements that might be revertable, and others not:

 props
        & standardSystem Unstable
        & revert (Docker.docked container hostname "webserver")
        & Docker.docked container hostname "amd64-git-annex-builder"
        & Docker.garbageCollected
Posted
adding docker support to propellor

Propellor development is churning away! (And leaving no few puns in its wake..)

Now it supports secure handling of private data like passwords (only the host that owns it can see it), and fully end-to-end secured deployment via gpg signed and verified commits.

And, I've just gotten support for Docker to build. Probably not quite work, but it should only be a few bugs away at this point.

Here's how to deploy a dockerized webserver with propellor:

host hostname@"clam.kitenet.net" = Just
    [ Docker.configured
    , File.dirExists "/var/www"
    , Docker.hasContainer hostname "webserver" container
    ]

container _ "webserver" = Just $ Docker.containerFromImage "joeyh/debian-unstable"
        [ Docker.publish "80:80"
        , Docker.volume "/var/www:/var/www"
        , Docker.inside
            [ serviceRunning "apache2"
                `requires` Apt.installed ["apache2"]
            ]
        ]

Docker containers are set up using Properties too, just like regular hosts, but their Properties are run inside the container.

That means that, if I change the web server port above, Propellor will notice the container config is out of date, and stop the container, commit an image based on it, and quickly use that to bring up a new container with the new configuration.

If I change the web server to say, lighttpd, Propellor will run inside the container, and notice that it needs to install lighttpd to satisfy the new property, and so will update the container without needing to take it down.

Adding all this behavior took only 253 lines of code, and none of it impacts the core of Propellor at all; it's all in Propellor.Property.Docker. (Well, I did need another hundred lines to write a daemon that runs inside the container and reads commands to run over a named pipe... Docker makes running ad-hoc commands inside a container a PITA.)

So, I think that this vindicates the approach of making the configuration of Propellor be a list of Properties, which can be constructed by abitrarily interesting Haskell code. I didn't design Propellor to support containers, but it was easy to find a way to express them as shown above.

Compare that with how Puppet supports Docker: http://docs.docker.io/en/latest/use/puppet/

docker::run { 'helloworld':
  image        => 'ubuntu',
  command      => '/bin/sh -c "while true; do echo hello world; sleep 1; done"',
  ports        => ['4444', '4555'],
...

All puppet manages is running the image and a simple static command inside it. All the complexities that puppet provides for configuring servers cannot easily be brought to bear inside the container, and a large reason for that is, I think, that its configuration file is just not expressive enough.

Posted
introducing propellor

Whups, I seem to have built a configuration management system this evening!

Propellor has similar goals to chef or puppet or ansible, but with an approach much more like slaughter. Except it's configured by writing Haskell code.

The name is because propellor ensures that a system is configured with the desired PROPerties, and also because it kind of pulls system configuration along after it. And you may not want to stand too close.

Disclaimer: I'm not really a sysadmin, except for on the scale of "diffuse administration of every Debian machine on planet earth or nearby", and so I don't really understand configuration management. (Well, I did write debconf, which claims to be the "Debian Configuration Management system".. But I didn't understand configuration management back then either.)

So, propellor makes some perhaps wacky choices. The least of these is that it's built from a git repository that any (theoretical) other users will fork and modify; a cron job can re-make it from time to time and pull down configuration changes, or something can be run to push changes.

A really simple configuration for a Tor bridge server using propellor looks something like this:

main = ensureProperties
    [ Apt.stdSourcesList Apt.Stable `onChange` Apt.upgrade
    , Apt.removed ["exim4"] `onChange` Apt.autoRemove
    , Hostname.set "bridget"
    , Ssh.uniqueHostKeys
    , Tor.isBridge
    ]

Since it's just haskell code, it's "easy" to refactor out common configurations for classes of servers, etc. Or perhaps integrate reclass? I don't know. I'm happy with just pure functions and type-safe refactorings of my configs, I think.

Properties are also written in Haskell of course. This one ensures that all the packages in a list are installed.

installed :: [Package] -> Property
installed ps = check (isInstallable ps) go
  where
        go = runApt $ [Param "-y", Param "install"] ++ map Param ps

Here's one that ensures the hostname is set to the desired value, which shows how to specify content for a file, and also how to run another action if a change needed to be made to satisfy a property.

set :: HostName -> Property
set hostname = "/etc/hostname" `File.hasContent` [hostname]
        `onChange` cmdProperty "hostname" [Param hostname]

Here's part of a custom one that I use to check out a user's home directory from git. Shows how to make a property require that some other property is satisfied first, and how to test if a property has already been satisfied.

installedFor :: UserName -> Property
installedFor user = check (not <$> hasGitDir user) $
        Property ("githome " ++ user) (go =<< homedir user)
                    `requires` Apt.installed ["git", "myrepos"]
  where
    go ... -- 12 lines elided

I'm about 37% happy with the overall approach to listing properties and combining properties into larger properties etc. I think that some unifying insight is missing -- perhaps there should be a Property monad? But as long as it yields a list of properties, any smarter thing should be able to be built on top of this.

Propellor is 564 lines of code, including 25 or so built-in properties like the examples above. It took around 4 hours to build.

I'm pretty sure it was easier to write it than it would have been to look into ansible and salt and slaughter (and also liw's human-readable configuration language whose name I've forgotten) in enough detail to pick one, and learn how its configuration worked, and warp it into something close to how I wanted this to work.

I think that's interesting.. It's partly about NIH and I-want-everything-in-Haskell, but it's also about a complicated system that is a lot of things to a lot of people -- of the kind I see when I look at ansible -- vs the tools and experience to build just the thing you want without the cruft. Nice to have the latter!

Posted

git-annex devblog: last checked (416 posts)