Joey blogs about his work here on a semi-daily basis. For lower post frequency and wider-interest topics, see the main blog.

git-annex devblog
day 557 upgrade bugfixes

Fixed several bugs involving upgrade to v7 when the git repository already v7 contained unlocked files. The worst of those involved direct mode and caused the whole file content to get checked into git. While that's a fairly unusual case, it's an ugly enough bug that I rushed out a release to fix it.

Also, LWN has posted a comparison of git-annex and git LFS.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 556 snow day

Snowed in and without internet until now, I've been working on the backlog. This included adding git annex find --branch and adding support for combining options like --include, --largerthan etc with --branch.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 552 523 v7 release prep

I fixed two reversions yesterday (neither related to v7 repos) during a day of triage in preparation for the release of git-annex 7.

One of the reversions broke adding remotes in the webapp, and was filed all the way back in January with lots of confirmations. I feel bad I didn't get around to even looking at that bug report until now.

My backlog is kind of large, it hovers around 400 messages most of the time now, there needs to be a better way to make sure I notice such bad bugs. Would someone like to help with git-annex bug triage, picking out bugs that multiple users have confirmed, or that have good intructions to reproduce them, and helping me prioritize them? No coding required, massive contribution to git-annex. Please get in touch.


Anyway, after that full day's work, I took a look at the autobuilders, and it was bad; the test suite was failing everywhere testing v7. For quite a while I've been seeing intermittent test suite failures involving the new repo version, that mostly only happened on the autobuilders. But now they were more reproducible; a recent change made them happen much more frequently. That was good; it made it easier to track down the problem.

Which was that git-annex was getting mtime information with 1 second granularity. So when the test suite modified a file several times in the same second, git-annex could fail to notice some of the modifications. I think when I origianlly developed the inode cache module in 2013, for direct mode, there was no easy way to access high-precision mtimes from haskell, but there is now, and git-annex will use them.

That left one other failure in the test suite, an intermittent crash of sqlite with ErrorIO on Linux. May be related to the known sqlite crashes in WSL. I've been trying various things today to try to fix it, but have to run the test suite in a loop for several hours to reproduce it reliably.

Posted
git-annex devblog
day 551 v6 or v7

In the delaysmudge branch, I've implemented the delayed worktree update in the post-merge/post-checkout hooks for v6. It works very well!

In particular, with annex.thin set, checking out a branch containing a huge unlocked file does a fast hard link to the file.

Remaining problem before merging that is, how to get the new hooks installed? Of course git annex init and git annex upgrade install them, but I know plenty of people have v6 repositories already, without those hooks.

So, would it be better to bump up to v7 and install the hooks on that upgrade, or stay on v6 and say that it was, after all, experimental up until now, and so the minor bother of needing to run git annex init in existing v6 repositories is acceptable? If the version is bumped to v7, that will cause some pain for users of older versions of git-annex that won't support it, but those old versions also have pretty big gaps in their support for v6. I'm undecided, but leaning toward v7, even though it will also mean a lot of work to update all the documentation, as well as needing changes to projects like datalad that use git-annex. Feedback on this decision is welcomed below...

Posted
git-annex devblog
day 550 a plan to finish v6

Dreadfully early this morning I developed a plan for a way to finish the last v6 blocker, that works around most of the problems with git's smudge interface. The only problem with the plan is that it would make both git stash and git reset --hard leave unlocked annexed files in an unpopulated state when their content is available. The user would have to run git-annex afterwards to fix up after them. All other git checkout, merge, etc commands would work though.

Not sure how I feel about this plan, but it seems to be the best one so far, other than going off and trying to improve git's smudge interface again. I also wrote up ?git smudge clean interface suboptiomal which explains the problems with git's interface in detail.

Posted
git-annex devblog
day 549 operating on hidden files

Goal for today was to make git annex sync --content operate on files hidden by git annex adjust --hide-missing. However, this got into the weeds pretty quickly due to the problem of how to handle --content-of=path when either the whole path or some files within it may be hidden.

Eventually I discovered that git ls-files --with-tree can be used to get a combined list of files in the index plus files in another tree, which in git-annex's case is the original branch that got adjusted. It's not documented to work the way I'm using it (worrying), but it's perfect, because git-annex already uses git ls-files extensively and this could let lots of commands get support for operating on hidden files.

That said, I'm going to limit it to git annex sync for now, because it would be a lot of work to make lots of commands support them, and there could easily be commands where supporting them adds lots of complexity or room for confusion.

Demo time:

joey@darkstar:/tmp> git clone ~/lib/sound/
Cloning into 'sound'...
done.
Checking out files: 100% (45727/45727), done.
joey@darkstar:/tmp> cd sound/
joey@darkstar:/tmp/sound> git annex init --version=6
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(scanning for unlocked files...)
ok
joey@darkstar:/tmp/sound> git annex adjust --hide-missing
adjust 
Switched to branch 'adjusted/master(hidemissing)'
ok
joey@darkstar:/tmp/sound#master(hidemissing)> ls
podcasts
joey@darkstar:/tmp/sound#master(hidemissing)> ls podcasts
feeds
joey@darkstar:/tmp/sound#master(hidemissing)> git annex sync origin --no-push -C podcasts
...
joey@darkstar:/tmp/sound> time git annex adjust --hide-missing
adjust
ok
15.03user 3.11system 0:14.95elapsed 121%CPU (0avgtext+0avgdata 93280maxresident)k
0inputs+88outputs (0major+12206minor)pagefaults 0swaps
joey@darkstar:/tmp/sound#master(hidemissing)> ls podcasts
Astronomy_Cast/                                     Hacking_Culture/
Benjamen_Walker_s_Theory_of_Everything/             In_Our_Time/
Clarkesworld_Magazine___Science_Fiction___Fantasy/  Lightspeed_MagazineLightspeed_Magazine___Science_Fiction___Fantasy/
DatCast/                                            Long_Now__Seminars_About_Long_term_Thinking/
Escape_Pod/                                         Love___Radio/
Gravy/                                              feeds

Close to being able to use this on my phone. ;-)

Posted
git-annex devblog
day 548 hiding missing files

At long last there's a way to hide annexed files whose content is missing from the working tree: git-annex adjust --hide-missing

And once you've run that command, git annex sync will update the tree to hide/unhide files whose content availability has changed. (So will running git annex adjust again with the same options.)

You can also combine --hide-missing with --unlock, which should prove useful in a lot of situations.

My implementation today is as simple as possible, which means that every time it updates the adjusted branch it does a full traversal of the original branch, checks content availability, and generates a new branch. So it may not be super fast in a large repo, but I was able to implement it in one day's work. It should be possible later to speed it up a lot, by maintaining more state.

Today's work was sponsored by Ethan Aubin.

Posted
git-annex devblog
day 547 v6 almost complete

No time to blog yesterday, but I somehow found the time to fix the second to last known major issue with v6 mode, a database inconsistency problem involving touching annexed files.

The only remaining blocker for v6 not being experimental is that git checkout of large unlocked files can use a lot of memory (and doesn't honor annex.thin).

Also I finally have a rought plan for how to hide missing files: Have git annex sync update the working tree to only show visible files. Still details to work out, but it would be great to finally get this often-requested feature.

Posted
git-annex devblog
day 546 deleted 40 thousand lines of code

Pulled the trigger on the old Android builds, and made a massive commit removing all the cruft that had built up to enable them. Running in Termux is just better. It's important to note this does not mean I've given up on more native git-annex Android stuff, indeed there are promising developments in ghc Android support that I'm keeping an eye on.

I'll kind of miss the EvilSplicer, that was 750 lines of crazy code to be proud of. But really, it's going to be great to not have hanging over me the prospect that any change could break the Android build and end up needing tons of work to resolve.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 545 termux improvements

I've improved the termux installation, adding an installer script to make it easier, and fixing some issues that have been reported. And it supports arm64 and also should work on Intel android devices. This feels very close to being able to remove the old deprecated Android apps.

I'm temporarily running the arm64 builds on my phone, in a Debian chroot. But it overheats, so this is a stopgap and it won't autobuild daily, only manually at release time.

Released git-annex 6.20181011.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 524 new phone

Been making some improvements to git-annex export over the past couple days, but took time off this afternoon to set up a new phone, and try git-annex in termux on it. Luckily, I was able to reproduce the signal 11 on arm64 problem that several users have reported earlier, and also found a fix, which is simply to build git-annex for arm64.

So I want to set up a arm64 autobuilder, and if someone has an arm64 server that could host it, that would be great. Otherwise, I could use Scaleway, but I'd rather avoid that ongoing expense.

Also fixed a recent reversion in the linux standalone runshell that broke git-annex in termux, and probably on some other systems.

Today's work was sponsored by Trenton Cronholm on Patreon

Posted
git-annex devblog
day 523 backlog

So I've been catching up on backlog for a couple of days. Including reading all the old todos, and closing a bunch of them that turned out to have been implemented already.

Today I added an annex.jobs setting, fixed annex.web-options which was broken in the semi-recent security update, and fixed a very tricky bug in rmurl.

(What happened to the http://git-annex.branchable.com/todo/to_and_from_multiple_remotes/ I was working on earlier this week? When I looked at the details, it was much more complicated than I had thought. Back burnered.)

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 521 newlines in filenames

Unix would be better if filenames could not contain newlines. But they can, and so today was spent dealing with some technical debt.

The main problem with using git-annex with filenames with newlines is that git cat-file --batch uses a line-based protocol. It would be nice if that were extended to support -z like most of the rest of git does, but I realized I could work around this by not using batch mode for the rare filename with a newline. Handling such files will be slower than other files, but at least it will work.

Then I realized that git-annex has its own problems with its --batch option and files with newlines. So I added support for -z to every batchable command in git-annex, including a couple of commands that did batch input without a --batch option.

Now git-annex should fully support filenames containing newlines, as well as anything else. The best thing to do if you have such a file is to commit it and then git mv it to a better name.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 520 storm before the calm

Well it took the whole day to finish the release. Including fixing a deadlock when the new v6 code runs with an older git, and some build errors. Ant there's still an intermittent test suite failure involving v6 on one autobuilder, which will need to be dealt with later.

This is a big release, lots of bug fixes, lots of v6 improvements, and significant S3 improvements.

Today's work was sponsored by Paul Walmsley on Patreon.

Posted
git-annex devblog
day 519 release prep

I'm in release prep mode now, fixing build problems and a few bugs, but the v6 sprint is well over, though v6 still has its issues. Might as well release all the last month's work.

Yesterday was taken up with dealing with some very ugly git interface stuff that changed between versions. An July workaround for a bug in git turns out to have caused reversions with older versions, and was not a complete fix either. Tuned it to hopefully work better.

This was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 518 S3 versioning finishing touches

Got git-annex downloading versioned files from S3, without needing S3 credentials. This makes a S3 special remote be equally capable as a git-annex repository exported over http, other than of course not including the git objects.

An example of this new feature:

AWS_SECRET_ACCESS_KEY=... AWS_ACCESS_KEY_ID=...
git annex initremote s3 type=S3 public=yes exporttree=yes versioning=yes
git annex export --tracking master --to s3
git tag 1.0
# modify some files here
git annex sync --content s3

And then in a clone without the credentials:

git annex enableremote s3
git checkout 1.0
git annex get somefile

This is nice; I only wish it were supported by other special remotes. It seems that any special remote could be made to support it, but ones not supporting some kind of versioning would need to store each file twice, and many would also need each file to be uploaded to them twice. But perhaps there are others that do have a form of versioning. WebDAV for one has a versioning extension in RFC 3253.

Also did a final review of a patch Antoine Beaupré is working on to backport the recent git-annex security fixes to debian oldstable, git-annex 5.20141125. He described the backport in his blog:

This time again, Haskell was nice to work with: by changing type configurations and APIs, the compiler makes sure that everything works out and there are no inconsistencies. This logic is somewhat backwards to what we are used to: normally, in security updates, we avoid breaking APIs at all costs. But in Haskell, it's a fundamental way to make sure the system is still coherent.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 517 return to crowdfunding

Back to being only crowdfunded now.

Several little things today, including a git-annex.cabal patch from fftehnik that fixed building without the assistant, and supporting AWS_SESSION_TOKEN. The main work was on making git annex drop --dead prune obsolete per-remote metadata, and on fixing a bug in v6 mode that left git-annex object files writable.

Today's work was sponsored by Paul Walmsley in honor of Mark Phillips.

Posted
git-annex devblog
day 516 S3 exporttree with versioning continued

Finished this feature, and I'm liking it quite a lot! Though I had to put off support public versioned S3 access until some other time as I'm all out of time and energy.

The storage of S3 version IDs got rethought -- I was not comfortable with using per-remote state in the git-annex branch which would have caused problems if dropping from these remotes later gets supported.

So, I added per-remote metadata in about 1 hour! It's like git-annex's regular metadata, but scoped so only the remote that owns it can see it. This is perfect for storing things like S3 version IDs. It probably ought to be added to the external special remote interface since it could be used for lots of stuff.

Here's how that looks when S3 version IDs are stored in it:

1535737778.867692782s 31ea6c94-fba3-4952-99b5-285ae192d92a:V +woYHK59DD2VUkJfg527mEBBqtCaPlSXn#myfile

This work was supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 515 S3 exporttree with versioning

Most of the way done with implementing support for export to S3 buckets with versioning enabled. This will make the files from the most recent git annex export be visible to users browsing the bucket, while letting git-annex download any of the content from previous exports too.

Still need to test it. And, deletion of old content from such a bucket is not supported, and my initial thoughts are that it might not be possible in a multi-writer situation. I need to think about it more.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 514 v6 bug review

Looked over bugs filed about v6 mode and did some triage and analysis. smudge has the details.

This led to changing what's done by git add and git commit file when annex.largefiles is not configured. Rather than behaving like git annex add and always storing the file in the annex, it will store it in the annex if the old version was annexed, and in git if the old version was stored in git. This avoids accidental conversions.

It might make sense to have git annex add also do this, even in v5 repositories, but I want to concentrate on v6 for now, and also don't think that git add and git annex add necessarily need to behave identically in v6 mode. While using git commit -a doesn't imply anything about whether you want the file in git or the annex, using git-annex add seems to imply they you want it in the annex, unless you've gone out of your way to configure otherwise.


Also did some design work on supporting versioned S3 buckets with git-annex export.


This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 514 fixed 5 races in v6

That's a lot of races! Well, 4 of them were all related, but the fixes to them had to be made in two different places.

Hopefully that's all the v6 races fixed. I've been finding these races by inspection, who knows if I missed some. Anyway, I'm now down to one todo item left on the v6 sprint. Gonna take a break for a couple days before tackling it.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 513 v6 reconciling staged changes

More v6 work. Got most of the way to a solution to the problem of updating the associated files database for staged changes to unlocked files, eg a git mv.

While writing the test case, I was surprised to find that the problem is timing dependent. If a git mv is run less than a second after git add, git runs the smudge filter for whatever reason, which avoids the problem. With a longer delay, it doesn't run the smudge filter. Seems this could be the cause of intermittent glitches with v6 mode, and I've seen a few such glitches before.

Anyway, I developed an inexpensive way to find the relevant staged changes, using git diff with a full page of options to tweak its behavior just right. Still need to make that only run when the index has changed, not every time git-annex runs.

There's still a race between a command like git mv and git annex drop/get, that can result in the unlocked file's content not being updated. Don't have a solution to that yet.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 512 fixed race

Sleeping on that race from yesterday, I realized there is a way to fix it, and have implemented the fix. It doubled the overhead of updating the index, but that's worth it to not have a race condition to worry about.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 511 v6 improved index update

Found a better way to update the index after get/drop in v6 repositories. I was able to close all the todos around that.

Only problem is there is a race where a modification that happens to a file soon after get/drop gets unexpectedly staged by the index update. I made this race's window as small as I reasonably can. Fully fixing it would involve improvements to the git update-index interface, or another way to update the index.

Only two todos remain in smudge that I want to fix in the remainder of this v6 sprint.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 510 v6 get drop index

I've now fixed the worst problem with v6 mode, which was that get/drop of unlocked files would cause git to think that the files were modified.

Since the clean filter now runs quite fast, I was able to fix that by, after git-annex updates the worktree, restaging the not-really-modified file in the index.

This approach is not optimal; index file updates have overhead; and only one process can update the index file at one time. smudge has a bunch of new todo items for cases where this change causes problems. Still, it seems a lot better than the old behavior, which made v6 mode nearly unusable IMHO.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 509 filterdriver

Working on a "filterdriver" branch, I've implemented support for the long-running smudge/clean process interface.

It works, but not really any better than the old smudge/clean interface. Unfortunately git leaks memory just as badly in the new interface as it did in the old interface when sending large data to the smudge filter. Also, the new interface requires that the clean filter read all the content of the file from git, even when it's just going to look at the file on disk, so that's worse performance.

So, I don't think I'll be merging that branch yet, but git's interface does support adding capabilities, and perhaps a capability could be added that avoids it schlepping the file content over the pipe. Same as my old git patches tried to do with the old smudge/clean interface.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 508 git-protocol

Spent today implementing the git pkt-line protocol. Git uses it for a bunch of internal stuff, but also to talk to long-running filter processes.

This was my first time using attoparsec, which I quite enjoyed aside from some difficulty in parsing a 4 byte hex number. Even though parsing to a Word16 should naturally only consume 4 bytes, attoparsec will actually consume subsequent bytes that look like hex. And it may parse fewer than 4 bytes too. So my parser had to take 4 bytes and feed them back into a call to attoparsec. Which seemed weird, but works. I also used bytestring-builder, and between the two libraries, this should be quite a fast implementation of the protocol.

With that 300 lines of code written, it should be easy to implement support for the rest of the long-running filter process protocol. Which will surely speed up v6 a bit, since at least git won't be running git-annex over and over again for each file in the worktree. I hope it will also avoid a memory leak in git. That'll be the rest of the low-hanging fruit, before v6 improvements get really interesting.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 507 v6 revisited

Plan is to take some time this August and revisit v6, hoping to move it toward being production ready.

Today I studied the "Long Running Filter Process" documentation in gitattributes(5), as well as the supplimental documentation in git about the protocol they use. This interface was added to git after v6 mode was implemented, and hopefully some of v6's issues can be fixed by using it in some way. But I don't know how yet, it's not as simple as using this interface as-is (it was designed for something different), but finding a creative trick using it.

So far I have this idea to explore. It's promising, might fix the worst of the problems.

Also, reading over all the notes in smudge, I finally checked and yes, git doesn't require filters to consume all stdin anymore, and when they don't consume stdin, git doesn't leak memory anymore either. Which let me massively speed up git add in v6 repos. While before git add of a gigabyte file made git grow to a gigabyte in memory and copied a gigabyte through a pipe, it's now just as fast as git annex add in v5 mode is.

This work is supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 506 summer features

After the big security fix push, I've had a bit of a vacation. Several new features have also landed in git-annex though.

git-worktree support is a feature I'm fairly excited by. It turned out to be possible to make git-annex just work in working trees set up by git worktree, and they share the same object files. So, if you need several checkouts of a repository for whatever reason, this makes it really efficient to do. It's much better than the old method of using git clone --shared.

A new --accessedwithin option matches files whose content was accessed within a given amount of time. (Using the atime.) Of course it can be combined with other options, for example git annex move --to archive --not --accessedwithin=30d
There are a few open requests for other new file matching options that I hope to get to soon.

A small configuration addition of remote.name.annex-speculate-present to make git-annex try to get content from a remote even if its records don't indicate the remote contains the content allows setting up an interesting kind of local cache of annexed files which can even be shared between unrelated git-annex repositories, with inter-repository deduplication.

I suspect that remote.name.annex-speculate-present may also have other uses. It warps git-annex's behavior in a small but fundamental way which could let it fit into new places. Will be interesting to see.

There's also a annex.commitmessage config, which I am much less excited by, but enough people have asked for it over the years.

Also fixed a howler of a bug today: In -J mode, remotes were sorted not by cost, but by UUID! How did that not get noticed for 2 years?

Much of this work was sponsored by NSF-funded DataLad project at Dartmouth Colledge, as has been the case for the past 4 years. All told they've funded over 1000 hours of work on git-annex. This is the last month of that funding.

Posted
git-annex devblog
day 505 security fix release

Just released git-annex 6.20180626 with important security fixes!

Please go upgrade now, read the release notes for details about some necessary behavior changes, and if you're curious about the details of the security holes, see the advisory.

I've been dealing with these security holes for the past week and a half, and decided to use a security embargo while fixes were being developed due to the complexity of addressing security holes that impact both git-annex and external special remote programs. For the full story see past 5 posts in this devblog, which are being published all together now that the embargo is lifted.

Posted
git-annex devblog
day 504 security hole part 6

Was getting dangerously close to burnt out, or exhaustion leading to mistakes, so yesterday I took the day off, aside from spending the morning babysitting the android build every half hour. (It did finally succeed.)

Today, got back into it, and implemented a fix for CVE-2018-10859 and also the one case of CVE-2018-10857 that had not been dealt with before. This fix was really a lot easier than the previous fixes for CVE-2018-10857. Unfortunately this did mean not letting URL and WORM keys be downloaded from many special remotes by default, which is going to be painful for some.

Posted
git-annex devblog
day 503 security hole part 5

Started testing that the security fix will build everywhere on release day. This is being particularly painful for the android build, which has very old libraries and needed http-client updated, with many follow-on changes, and is not successfully building yet after 5 hours. I really need to finish deprecating the android build.

Pretty exhausted from all this, and thinking what to do about external special remotes, I elaborated on an idea that Daniel Dent had raised in discussions about vulnerability, and realized that git-annex has a second, worse vulnerability. This new one could be used to trick a git-annex user into decrypting gpg encrypted data that they had never stored in git-annex. The attacker needs to have control of both an encrypted special remote and a git remote, so it's not an easy exploit to pull off, but it's still super bad.

This week is going to be a lot longer than I thought, and it's already feeling kind of endless..

Posted
git-annex devblog
day 502 security hole part 4

Spent several hours dealing with the problem of http proxies, which bypassed the IP address checks added to prevent the security hole. Eventually got it filtering out http proxies located on private IP addresses.

Other than the question of what to do about external special remotes that may be vulerable to related problems, it looks like the security hole is all closed off in git-annex now.

Added a new page security with details of this and past security holes in git-annex.

Several people I reached out to for help with special remotes have gotten back to me, and we're discussing how the security hole may affect them and what to do. Thanks especially to Robie Basak and Daniel Dent for their work on security analysis.

Also prepared a minimal backport of the security fixes for the git-annex in Debian stable, which will probably be more palatable to their security team than the full 2000+ lines of patches I've developed so far. The minimal fix is secure, but suboptimal; it prevents even safe urls from being downloaded from the web special remote by default.

Posted
git-annex devblog
day 501 security hole part 3

Got the IP address restrictions for http implemented. (Except for http proxies.)

Unforunately as part of this, had to make youtube-dl and curl not be used by default. The annex.security.allowed-http-addresses config has to be opened up by the user in order to use those external commands, since they can follow arbitrary redirects.

Also thought some more about how external special remotes might be affected, and sent their authors' a heads-up.

Posted
git-annex devblog
day 500 security hole part 2

Most of the day was spent staring at the http-client source code and trying to find a way to add the IP address checks to it that I need to fully close the security hole.

In the end, I did find a way, with the duplication of a couple dozen lines of code from http-client. It will let the security fix be used with libraries like aws and DAV that build on top of http-client, too.

While the code is in git-annex for now, it's fully disconnected and would also be useful if a web browser were implemented in Haskell, to implement same-origin restrictions while avoiding DNS rebinding attacks.

Looks like http proxies and curl will need to be disabled by default, since this fix can't support either of them securely. I wonder how web browsers deal with http proxies, DNS rebinding attacks and same-origin? I can't think of a secure way.

Next I need a function that checks if an IP address is a link-local address or a private network address. For both ipv4 and ipv6. Could not find anything handy on hackage, so I'm gonna have to stare at some RFCs. Perhaps this evening, for now, it's time to swim in the river.

Today's work was sponsored by Jake Vosloo on Patreon

Posted
git-annex devblog
day 499 security hole

I'm writing this on a private branch, it won't be posted until a week from now when the security hole is disclosed.

Security is not compositional. You can have one good feature, and add another good feature, and the result is not two good features, but a new security hole. In this case security hole private data exposure via addurl (CVE-2018-10857). And it can be hard to spot this kind of security hole, but then once it's known it seems blindly obvious.

It came to me last night and by this morning I had decided the potential impact was large enough to do a coordinated disclosure. Spent the first half of the day thinking through ways to fix it that don't involve writing my own http library. Then started getting in touch with all the distributions' security teams. And then coded up a fairly complete fix for the worst part of the security hole, although a secondary part is going to need considerably more work.

It looks like the external special remotes are going to need at least some security review too, and I'm still thinking that part of the problem over.

Exhausted.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 498 unexpected release prep

I'm unexpectedly preparing for a release soon, because the last release turned out to have a crasher bug when using a bare repository or --all, and a bug that prevented the webapp starting on OSX.

As well as fixing those, the new release will have several smaller improvements and fixes all done today. It's been a rather productive day.

And, using git-annex in Termux is now working even on newer versions of Android, that use seccomp filtering to filter out system calls that the ghc runtime uses. The proot program on Termux worked around that nasty problem.

The old Android app is now deprecated, and I'll probably remove it entirely within a few months unless I find a reason not to. So, I also closed almost all the old Android-specific bug reports today. I don't normally do mass bug closures without followup, but it was warranted here; almost all of those bugs are specific to the old Android app.

Today's work was sponsored by Trenton Cronholm on Patreon

Posted
git-annex devblog
day 497 rethinking the android port

I've long been unsatisfied with the amount of effort needed to maintain the Android port in its current state, the hacky cross-compiler toolchain needs days of wasted work to update, and is constantly out of date and breaking in one way or other. This sucks up any time I might spare to actually improve the Android port.

So, it was quite a surprise yesterday when I downloaded the git-annex standalone Linux tarball into the Termux Android shell and unpacked it, and it more or less worked!

The result, after a few minor fixes, works just as well as the git-annex Android app, and probably better. Even the webapp works well, and with the Termux:Boot app, it can even autostart the assistant on boot as a daemon. If you want to give it a try, see install on Android in Termux.

So, I am leaning toward deprecating the android port for this, removing 14 thousand lines of patches and android-specific code. Not going to do it just yet, but I feel a weight lifting...


Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 496 move numcopies safety revisited

After talking it over in move violates numcopies, we found a nicer compromise for git annex move. Rather than strictly enforcing numcopies, it avoids making any bad situations worse. For example, when there's only one copy of a file, it can be moved even if numcopies is higher. But, when numcopies is 2 and the source and destination repos have a copy, move will not drop from the source repo, since that would make it worse.

Implemented that today. While doing so I got bit by the inverted Ord instance for TrustLevel, so spent a while cleaning that up.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 495 move numcopies safety

New version released today with adb special remote, http connection caching, improved progress displays, annex.retry, and other changes.

I've been rethinking git annex move in the context of numcopies checking. Thanks to a user posting git-annex move does not appear to respect numcopies. Of course, move is known not to do that, but it's useful to get a perspective that this is susprising behavior and not wanted by that user, and poorly documented besides.

So, I added git annex move --safe which does honor numcopies, so it only does a copy when there are not enough copies to move.

I'm leaning toward making that the default behavior, and needing git annex move --unsafe to get the current behavior of moving without a net. Of course, lots of us probably use move and like the current behavior, and such a change can break workflows and scripts. There might be a transition period where move warns when run without --safe or --unsafe. Feedback welcomed on the bug report move violates numcopies.

Posted
git-annex devblog
day 494 url download changes

To make git-annex faster when it's dealing with a lot of urls, I decided to make it use the http-conduit library for all url access by default. That way, http pipelining will speed up repeated requests to the same web servers. This is kind of a follow-up to the recent elimination of rsync.

Some users rely on some annex.web-options or a .netrc file to configure how git-annex downloads urls. To keep that supported, when annex.web-options is set, git-annex will use curl. To use a .netrc file, curl needs an option, so you would configure:

git config annex.web-options --netrc

I get the feeling that nobody has implemented resuming interrupted downloads of files using http-conduit before, because it was unexpectedly kind of hard and http-types lacks support for some of the necessary range-related HTTP stuff.

Today's work was supported by the NSF-funded DataLad project.


Stewart V. Wright announced recastex, a program that publishes podcasts and other files from by git-annex to your phone.

Posted
git-annex devblog
day 493 two new special remotes

I've been traveling and at conferences.

In the meantime, Lykos has released git-annex-remote-googledrive, a replacement for an older, unmaintained Google Drive special remote.

Today I added a special remote that stores files on an Android device using adb. It supports git annex export, so the files stored on the Android device can have the same filenames as in the git-annex repository. I have plans for making git annex import support special remotes, and other features to make bi-directional sync with Android work well.

Of course, there is some overlap between that and the Android port, but they probably serve different use cases.

Today's work was sponsored by Trenton Cronholm on Patreon

Posted
git-annex devblog
day 492 concurrency is hard

In the past 24 hours, I've fixed two extremely hairy problems with git annex get -J. One was a locking problem. And the other involved thundering herds and ssh connection multiplexing and inherited file descriptors and races, and ... That took 4 hours of investigation to understand well enough to fix it.

Neither of those involved the ssh P2P changes, other than perhaps they exposed one of the issues more than it was exposed before, but on the plus side I've been testing that new code quite a lot as I worked on them..

Today's work was supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 491 annex.verify redux

With fresh eyes I stopped being confused by P2P protocol free monad stuff, and got annex.verify=false supported when it's safe to skip verification.

And, I found some cases where resuming a download with annex.verify=false could let corrupt data into the repository. This is not a new problem; as well as with the P2P protocol, it could happen when downloading from the web, and possibly with some external special remotes that support resuming. So, it seemed best to override annex.verify configuration when resuming a download.

Also fixed up some progress bar stuff related to the P2P protocol. Including dealing with the case where the size of a key being downloaded is not known until the peer starts sending its data. The progress bar will now be updated with the size from the P2P protocol, so it can display a percentage even in this case.

I hope that's the end of the P2P protocol stuff for now.

Today's work was supported by the NSF-funded DataLad project.

Posted
git-annex devblog
day 490 kind of annoying

Working on getting the git-annex-shell P2P protocol into a releasable state. This was kind of annoying.

I started out wanting to make annex.verify=false disable verification when using the P2P protocol. But, that needed protocol changes, and unfortunately the protocol was not extensible. I thought it was supposed to reject unknown commands and keep the connection open, which would make extensions easy, but unfortunately it actually closed the connection after an unknown command.

So, I added a version negotiation to the P2P protocol, but it's not done for tor remotes yet, and will be turned on for them in some future flag day, once all of them get upgraded.

After all that, I got completely stuck on the annex.verify change. Multiple problems are preventing me from seeing a way to do it at all. support disabling verification of transfer over p2p protocol This must be why I didn't support it in the first place when building the P2P protocol two years ago.

Also fixed performance when a ssh remote is unavailable, where it was trying to connect twice to the remote for each action. And confirmed that the assistant will behave ok when moving between networks while it has P2P connections open. So, other than annex.verify not being supported, I feel fairly ready to release this new feature.

Today's work was supported by an anonymous bitcoin donor.

Posted
git-annex devblog
day 489 zooming

Andrew Wringler has released git-annex-turtle which provides Apple Finder integration for git-annex on macOS, including custom badge icons, contextual menus and a Menubar icon. This looks really nice!


I've completed the P2P protocol with git-annex-shell. It turned out just as fast and good as I'd hoped. accellerate ssh remotes with git-annex-shell mass protocol has the benchmark details.

Even transferring of large files speeds up somewhat; git-annex is actually faster than rsync at shoving bytes down a pipe. (Though rsync still wins in lots of other benchmarks I'm sure.)

Surprisingly, in one benchmark, I found accessing a repository on localhost via ssh is now slightly faster than accessing that same repository by path. I think that this is because when git-annex is talking to git-annex-ssh, the programs run on different CPU cores, so there's some extra concurrency.

There are still some implementation todos, some of which will make it faster yet, and others involving potential edge cases. This is a big change and will need some time to be considered stable.


Today's work was sponsored by Jake Vosloo on Patreon

Posted
git-annex devblog
day 488 groundwork for using p2pstdio

Spent most of the day laying groundwork for using git-annex-shell p2pstdio. Implemented pools of ssh connections to it, and added uuid verification. Then generalized code from the p2p remote so it can be reused in the git remote. The types got super hairy in there, but the code reuse level is excellent.

Finally it was time to convert the first ssh remote method to use the P2P protocol. I chose key removal, since benchmarking it doesn't involve the size of annexed objects.

Here's the P2P protocol in action over ssh:

[2018-03-08 17:02:47.688627136] chat: ssh ["localhost","-S",".git/annex/ssh/localhost","-o","ControlMaster=auto","-o","ControlPersist=yes","-T","git-annex-shell 'p2pstdio' '/~/tmp/bench/a' '--debug' 'da72c285-2615-4a67-828f-eaae4f42fc3d' --uuid db017fac-eb8f-42d9-9d09-2780b193cef1"]
[2018-03-08 17:02:47.901897195] P2P < AUTH-SUCCESS db017fac-eb8f-42d9-9d09-2780b193cef1
[2018-03-08 17:02:47.902025504] P2P > REMOVE SHA256E-s4--97b912eb4a61df5f806ca6239dde3e1a4f51ad20aced1642cbb83dc510a5fa6b
[2018-03-08 17:02:47.910074003] P2P < SUCCESS
[2018-03-08 17:02:47.914181701] P2P > REMOVE SHA256E-s4--6af2f5b785a8930f0bd3edc833e18fa191167ab0535ef359b19a1982a6984e96
[2018-03-08 17:02:47.918699806] P2P < SUCCESS

For a benchmark, I set up a repository with 1000 annexed files, and cloned it from localhost, then ran git annex drop --from origin.

before: 41 seconds
after: 10 seconds

400% speedup for dropping is pretty great.. And when there's more latency than loopback has, the improvement should be more pronounced. Will test it this evening over my satellite internet. :)

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
git-annex devblog
day 487 git-annex-shell p2pstdio

It was rather easy to implement git-annex-shell p2pstdio, the P2P protocol implementation done for tor is quite generic and easy to adapt for this.

The only complication was that git-annex-shell has a readonly mode, so the protocol server needed modifications to support that. Well, there's also some innefficiency around unncessary verification of transferred content in some cases, which will probably need extensions to the P2P protocol later.

Also wrote up some documentation of what the P2P protocol looks like, for anyone who might want to communiate with git-annex-shell using it, for some reason, and doesn't understand Haskell and free monads. P2P protocol

While comparing the code of the P2P server and git-annex-shell commands, I noticed that the P2P server didn't check inAnnex when locking content, while git-annex-shell lockcontent did check inAnnex. This turned out to be a ugly data loss bug involving direct mode repositories, where a modified direct mode file was allowed when locking a key in the repository. Turned out that the bug happened when locking a file over tor to drop it locally, but also when locking a file locally in a direct mode repository to allow dropping it from any remote.

Very glad I noticed that, and I've changed the API to prevent that class of bug. I feel this is not a severe data loss bug, because when a direct mode repository is involved, dropping from somewhere and then modifying the file in the direct mode repository can have the same effect of losing the old copy. The bug just made data loss happen when running the same operations in the other order.

Next will be making git-annex use this new git-annex-shell feature when available.


Today's work was sponsored by Trenton Cronholm on Patreon

Posted
git-annex devblog
day 486 time to ditch rsync

I'm excited by this new design accellerate ssh remotes with git-annex-shell mass protocol.

git-annex's use of rsync got transfers over ssh working quickly early on, but other than resuming interrupted transfers, using rsync doesn't really gain git-annex much, since annexed objects don't change over time. And rsync has always involved a certian amount of overhead that a custom protocol would avoid.

It's especially handy that such a protocol was already developed for git-annex p2p when using tor. I've not heard of a lot of people using that feature (but maybe people who do have reason not to talk about it), but it's a good solid thing, implemented very generically with a free monad, and reusing it for git-annex-shell would be great.

Posted
git-annex devblog
day 485 slow and steady

I've been recovering from some stuff over the past month, so progress lately has been slow, but still steadily progressing. Yesterday's release of git-annex had a month and a half of improvements, including JSON enhancements, adding extension support to the external special remote protocol, and making fsck warn about required content that's missing.

Today I've been working on git annex export to rsync special remotes.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 484 special remote protocol extensions

The external special remote protocol had extensibility built into it for messages git-annex sends, but not for messages that the remote sends back to git-annex. To fix this asymmetry, I've added a new EXTENSIONS to the protocol, which can be used to find out about what new protocol extensions are supported.

There was the possibility that adding that might break some external special remote that hardcoded the intial protocol messages. So, I checked all of them that I know of, and all were ok, except for older versions of datalad, which we were able to deal with. If you have your own external special remote implementation, now would be a good time to check it.

Posted
git-annex devblog
day 483 faster start with removable drives

git-annex does a little bit of work at startup to learn about the git repository it's running in. That's been optimised some before, but not entirely eliminated; it's just too useful to have that information always available inside git-annex. But it turned out that it was doing more work than needed for many commands, by checking the git config of local remotes. Thas caused unncessary spin up of removable drives, or automount timeouts, or generally more work than needed when running commands like git annex find and even tab completing git-annex. That's fixed now, so it avoids checking the git config of remotes except when running commands that access some remote.

There's also a new config setting, remote.<name>.annex-checkuuid that can be set to false to defer checking the uuid of local repositories until git-annex actually uses them. That can avoid even more spinup/automounts, but that config prevents git-annex from transparently handling the case where different removable drives get mounted to the same place at different times.

Speaking of speed, I benchmarked linux kernel mitigation for the meltdown attack making git status 5% slower from a warm cache. It did not slow down git annex find or git annex find --in remote enough to be measured by my benchmark. I expect that git-annex commands that transfer data are bottlenecked on IO and won't be slowed down appreciably by the meltdown mitigation either.

Today's work was sponsored by Jake Vosloo on Patreon.

Posted
git-annex devblog
day 482 website login problem

I noticed a large drop in bug reports and comments on the git-annex website over the holiday period of December. At first I thought this was just due to the holidays, even though holidays are often busy times for free software projects since lots of people have more time. But, traffic is still down this week, and several people emailed me about problems logging into the website.

So, lacking much detail at all about what people were doing that didn't work, I've the past day and a some trying to guess at and reproduce the problem. And I think I have, http://ikiwiki.info/bugs/login_problem/, and once reproduced it was of course easily fixed.

If you tried to post something and got a login prompt instead of seeing it on the website, now would be a good time to post it again.

If you still have login problems with the website (other than openid which has lot of broken providers and badly specified protocol and stuff), please get in touch and try to provide enough detail to reproduce the problem, cuz my guessing muscles are feeling sprained after this experience.

In the meantime, there has still been git-annex development happening. I added a new git annex inprogress command over the holidays that allows doing things like streaming videos while git annex get is still downloading them. Several fixes to problems with the switch to youtube-dl are fixed, core.sharedRepository is handled better, and the cabal file's custom-setup stanza was added back after quite a lot of refactoring of library code.


Today's work was sponsored by an anonymous bitcoin donor.

Posted
propellor arm boards testing

Took a while to find the necessary serial cables and SD cards to test propellor's ARM disk image generation capabilies.

Ended up adding support for the SheevaPlug because it was the first board I found the hardware to test. And after fixing a couple oversights, it worked on the second boot!

Then after a trip to buy a microSD card, Olimex Lime worked on the first boot! So did CubieTruck and Banana Pi. I went ahead and added a dozen other sunxi boards that Debian supports, which will probably all work.

(Unfortunately I accidentially corrupted the disk of my home server (router/solar power monitor/git-annex build box) while testing the CubieTruck. Luckily, that server was the first ARM board I want to rebuild cleanly with propellor, and its configuration was almost entirely in propellor already, so rebuilding it now.)


Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
propellor arm boards

Working today on adding support for ARM boards to propellor.

I started by adding support for generating non-native chroots. qemu-debootstrap makes that fairly simple. Propellor is able to run inside a non-native chroot too, to ensure properties in there; the way it injects itself into a chroot wins again as that just worked.

Then, added support for flash-kernel, and for u-boot installation. Installing u-boot to the boot sector of SD cards used by common ARM boards does not seem to be automated anywhere in Debian, just README.Debian files document dd commands. It may be that's not needed for a lot of boards (my CubieTruck boots without it), but I implemented it anyway.

And, Propellor.Property.Machine is a new module with properties for different ARM boards, to get the right kernel/flash-kernel/u-boot/etc configuration easily.

This all works, and propellor can update a bootable disk image for an ARM system in 30 seconds. I have not checked yet if it's really bootable.

Tomorrow, I'm going to dust off my ARM boards and try to get at least 3 boards tested, and if that goes well, will probably add untested properties for all the rest of the sunxi boards.


Today's work was sponsored by Jake Vosloo on Patreon.

Posted
haskell-scuttlebutt-types

Built a new haskell library, http://hackage.haskell.org/package/scuttlebutt-types

I've been using Secure Scuttlebutt for 6 months or so, and think it's a rather good peer-to-peer social network. But it's very Javascript centric, and I want to be able to play with its data from Haskell.

The scuttlebutt-types library is based on some earlier work by Peter Hajdu. I expanded it to be have all the core Scuttlebutt data types, and got it parsing most of the corpus of Scuttlebutt messages. That took most of yesterday and all of today. The next thing to tackle would be generating JSON for messages formatted so the network accepts it.

I don't know what scuttlebutt-types will be used for. Maybe looking up stats, or building bots, or perhaps even a Scuttlebutt client? We'll see..


Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
secret project progress bar

One of my initial goals for secret-project was for it to not need to implement a progress bar.

So, of course, today I found myself implementing a progress bar to finish up secret-project. As well as some other UI polishing, and fixing a couple of bugs in propellor that impacted it.

Ok, I'm entirely done with secret-project now, except for an unveiling later this month. Looking back over the devblog, it took only around 14 days total to build it all. Feels longer, but not bad!

Posted
secret project its aliiive

After a rather interesting morning, the secret-project is doing exactly what I set out to accomplish! It's alllive!

(I found a way to segfault the ghc runtime first. And then I also managed to crash firefox, and finally managed to crash and hard-hang rsync.)

All that remains to be done now is to clean up the user interface. I made propellor's output be displayed in the web browser, but currently it contains ansi escape sequences which don't look good.

This would probably be a bad time to implement a in-browser terminal emulator in haskell, and perhaps a good time to give propellor customizable output backends. I have 3 days left to completely finish this project.

Posted
secret project close but no cigar

One last detour: Had to do more work than I really want to at this stage, to make the secret-project pick a good disk to use. Hardcoding a disk device was not working reliably enough even for a demo. Ended up with some fairly sophisticated heuristics to pick the right disk, taking disk size and media into account.

Then finally got on with grub installation to the target disk. Installing grub to a disk from a chroot is a fiddley process hard to get right. But, I remembered writing similar code before; propellor installs grub to a disk image from a chroot. So I generalized that special-purpose code to something the secret-project can also use.

It was a very rainy day, and rain fade on the satellite internet prevented me from testing it quickly. There were some dumb bugs. But at 11:30 pm, it Just Worked! Well, at least the target booted. /etc/fstab is not 100% right.

Posted
end in sight

Late Friday evening, I realized that the secret-project's user interface should be a specialized propellor config file editor. Eureka! With that in mind, I reworked how the UserInput value is passed from the UI to propellor; now there's a UserInput.hs module that starts out with an unconfigured value, and the UI rewrites that file. This has a lot of benefits, including being able to test it without going through the UI, and letting the UI be decoupled into a separate program.

Also, sped up propellor's disk image building a lot. It already had some efficiency hacks, to reuse disk image files and rsync files to the disk image, but it was still repartitioning the disk image every time, and the whole raw disk image was being copied to create the vmdk file for VirtualBox. Now it only repartitions when something has changed, and the vmdk file references the disk image, which sped up the secret-project's 5 gigabyte image build time from around 30 minutes to 30 seconds.

With that procrastinationWgroundwork complete, I was finally at the point of testing the secret-project running on the disk image. There were a few minor problems, but within an hour it was successfully partitioning, mounting, and populating the target disk.

Still have to deal with boot loader installation and progress display, but the end of the secret-project is in sight.

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
disk partitioning nitty gritty

The secret-project can probably partition disks now. I have not tried it yet.

I generalized propellor's PartSpec DSL, which had been used for only auto-fitting disk images to chroot sizes, to also support things like partitions that use some percentage of the disk.

A sample partition table using that, that gives / and /srv each 20% of the disk, has a couple of fixed size partitions, and uses the rest for /home:

[ partition EXT2 `mountedAt` "/boot"
    `setFlag` BootFlag
    `setSize` MegaBytes 512
, partition EXT4 `mountedAt` "/"
    `useDiskSpace` (Percent 20)
, partition EXT4 `mountedAt` "/srv"
    `useDiskSpace` (Percent 20)
, partition EXT4 `mountedAt` "/home"
    `useDiskSpace` RemainingSpace
, swapPartition (MegaBytes 1024)
    ]

It would probably be good to extend that with a combinator that sets a minimum allows size, so eg / can be made no smaller than 4 GB. The implementation should make it simple enough to add such combinators.

I thought about reusing the disk image auto-fitting code, so the target's / would start at the size of the installer's /. May yet do that; it would make some sense when the target and installer are fairly similar systems.

Posted
a plan comes together

After building the neato Host versioning described in Functional Reactive Propellor this weekend, I was able to clean up secret-project's config file significantly It's feeling close to the final version now.

At this point, the disk image it builds is almost capable of installing the target system, and will try to do so when the user tells it to. But, choosing and partitioning of the target system disk is not implemented yet, so it installs to a chroot which will fail due to lack of disk space there. So, I have not actually tested it yet.

Posted
yak attack

Integrating propellor and secret-project stalled out last week. The problem was that secret-project can only be built with stack right now (until my patch to threepenny-gui to fix drag and drop handling gets accepted), but propellor did not support building itself with stack.

I didn't want to get sucked into yak shaving on that, and tried to find another approach, but finally gave in, and taught propellor how to build itself with stack. There was an open todo about that, with a hacky implementation by Arnaud Bailly, which I cleaned up.

Then the yak shaving continued as I revisited a tricky intermittent propellor bug. Think I've finally squashed that.

Finally, got back to where I left off last week, and at last here's a result! This is a disk image that was built entirely from a propellor config file, that contains a working propellor installation, and that starts up secret-project on boot.

Now to make this secret-project actually do something more than looking pretty..


Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
mashup

This was a one step forward, one step back kind of day, as I moved the working stuff from yesterday out of my personal propellor config file and into the secret-project repository, and stumbled over some issues while doing so.

But, I had a crucial idea last night. When propellor is used to build an installer image, the installer does not need to bootstrap the target system from scratch. It can just copy the installer to the target system, and then run propellor there, with a configuration that reverts any properties that the installer had but the installed system should not. This will be a lot faster and avoid duplicate downloads.

That's similar to how d-i's live-installer works, but simpler, since with propellor there's a short list of all the properties that the installer has, and propellor knows if a property can be reverted or not.

Today's work was sponsored by Riku Voipio.

Posted
high bandwidth propellor hacking

Doing a bunch of work on propellor this week. Some bug fixes and improvements to disk image building. Also some properties involving the XFCE desktop environment.

Putting it all together, I have 28 lines of propellor config file that generates a disk image that boots to a XFCE desktop and also has propellor installed. I wonder where it will go from here? ;-)

darkstar :: Host
darkstar = host "darkstar.kitenet.net" $ props
    ...
        & imageBuilt "/srv/propellor-disk.img"
                (Chroot.hostChroot demo (Chroot.Debootstrapped mempty))
                MSDOS (grubBooted PC)
                [ partition EXT2 `mountedAt` "/boot"
                        `setFlag` BootFlag
                , partition EXT4 `mountedAt` "/"
                        `mountOpt` errorReadonly
                        `addFreeSpace` MegaBytes 256
                , swapPartition (MegaBytes 256)
                ]

demo :: Host
demo = host "demo" $ props
        & osDebian Unstable X86_64
        & Apt.installed ["linux-image-amd64"]
        & bootstrappedFrom GitRepoOutsideChroot
        & User.accountFor user
        & root `User.hasInsecurePassword` "debian"
        & user `User.hasInsecurePassword` "debian"
        & XFCE.installedMin
        & XFCE.networkManager
        & XFCE.defaultPanelFor user File.OverwriteExisting
        & LightDM.autoLogin user
        & Apt.installed ["firefox"]
  where
        user = User "user"
        root = User "root"

Indcidentially, I have power and bandwidth to work on this kind of propellor stuff from home all day, for the first time! Up until now, all propellor features involving disk images were either tested remotely, or developed while I was away from home. It's a cloudy, rainy day; the solar upgrade and satellite internet paid off.


Today's work was sponsored by Riku Voipio.

Posted
prototype gamified interface

And now for something completely different..

What is this strange thing? It's a prototype, thrown together with open clip art in a long weekend. It's an exploration how far an interface can diverge from the traditional and still work. And it's a mini-game.

Watch the video, and at the end, try to answer these questions:

  • What will you do then?
  • What happens next?

Spoilers below...

What I hope you might have answered to those questions, after watching the video, is something like this:

  • What will you do then?
    I'll move the egg into the brain-tree.
  • What happens next?
    It will throw away the junk I had installed and replace it with what's in the egg.

The interface I'm diverging from is this kind of thing:

My key design points are these:

  • Avoid words entirely

    One of my takeaways from the Debian installer project is that it's important to make the interface support non-English users, but maintaining translations massively slows down development. I want an interface that can be quickly iterated on or thrown away.

  • Massively simplify

    In the Debian installer, we never managed to get rid of as many questions as we wanted to. I'm starting from the other end, and only putting in the absolute most essential questions.

    1. Do you want to delete everything that is on this computer?
    2. What's the hostname?
    3. What account to make?
    4. What password to use?

    I hope to stop at the first that I've implemented so far. It should be possible to make the hostname easy to change after installation, and for an end user installation, the username doesh't matter much, and the password generally adds little or no security (and desktop environments should make it easy to add a password later).

  • Don't target all the users

    Trying to target all users constrained the Debian installer in weird ways while complicating it massively.

    This if not for installing a server or an embedded system. This interface is targeting end users who want a working desktop system with minimum fuss, and are capable of seeing and dragging. There can be other interfaces for other users.

  • Make it fun

    Fun to use, and also fun to develop.

    I'm using threepenny-gui to build the insterface. This lets Haskell code be written that directly manipulates the web browser's DOM. I'm having a lot of fun with that and can already think of other projects I can use threepenny-gui with!

Previously: propellor is d-i 2.0

Posted
debug me keyrings

Releasing debug-me 1.20170520 today with a major improvement.

Now it will look for the gpg key of the developer in keyring files in /usr/share/debug-me/keyring/, and tell the user which project(s) the developer is known to be a member of. So, for example, Debian developers who connect to debug-me sessions of Debian users will be identified as a Debian developer. And when I connect to a debug-me user's session, debug-me will tell them that I'm the developer of debug-me.

This should help the user decide when to trust a developer to connect to their debug-me session. If they develop software that you're using, you already necessarily trust them and letting them debug your machine is not a stretch, as long as it's securely logged.

Thanks to Sean Whitton for the idea of checking the Debian keyring, which I've generalized to this.

Also, debug-me is now just an apt-get away for Debian unstable users, and I think it may end up in the next Debian release.

Posted
debug me released

debug-me is released! https://debug-me.branchable.com/

I made one last protocol compatibility breaking change before the release. Realized that while the websocket framing protocol has a version number, the higher-level protocol does not, which would made extending it very hard later. So, added in a protocol version number.

The release includes a tarball that should let debug-me run on most linux systems. I adapted the code from git-annex for completely linux distribution-independent packaging. That added 300 lines of code to debug-me's source tree, which is suboptimal. But all the options in the Linux app packaging space are suboptimal. Flatpak and snappy would both be ok -- if the other one didn't exist and they were were installed everywhere. Appimage needs the program to be built against an older version of libraries.

Posted
debug me screencast recording and server

Recorded a 7 minute screencast demoing debug-me, and another 3 minute screencast talking about its log files. That took around 5 hours of work actually, between finding a screencast program that works well (I used kazam), writing the "script", and setting the scenes for the user and developer desktops shown in the screencast.

While recording, I found a bug in debug-me. The gpg key was not available when it tried to verify it. I thought that I had seen gpg --verify --keyserver download a key from a keyserver and use it to verify but seems I was mistaken. So I changed the debug-me protocol to include the gpg public key, so it does not need to rely on accessing a keyserver.

Also deployed a debug-me server, http://debug-me.joeyh.name:8081/. It's not ideal for me to be running the debug-me server, because when me and a user are using that server with debug-me, I could use my control of the server to prevent the debug-me log being emailed to them, and also delete their local copy of the log.

I have a plan to https://debug-me.branchable.com/todo/send_to_multiple_servers/ that will avoid that problem but it needs a debug-me server run by someone else. Perhaps that will happen once I release debug-me; for now the single debug-me server will have to do.

Finally, made debug-me --verify log check all the signatures and hashes in the log file, and display the gpg keys of the participants in the debug-me session.

Today's work was sponsored by Jake Vosloo on Patreon.

(Also, Sean Whitton has stepped up to get debug-me into Debian!)

Posted
debug me one last big change

Added an additional hash chain of entered values to the debug-me data types. This fixes the known problem with debug-me's proof chains, that the order of two entered values could be ambiguous.

And also, it makes for nicer graphs! In this one, I typed "qwertyuiop" with high network lag, and you can see that the "r" didn't echo back until "t" "y" "u" had been entered. Then the two diverged states merged back together when "i" was entered chaining back to the last seen "r" and last entered "u".

debug-me.png

(Compare with the graph of the same kind of activity back in debug me first stage complete.)

Having debug-me generate these graphs has turned out to be a very good idea. Makes analyzing problems much easier.


Also made /quit in the control window quit the debug-me session. I want to also let the user pause and resume entry in the session, but it seems that could lead to more ambiguity problems in the proof chain, so I'll have to think that over carefully.

debug-me seems ready for release now, except it needs some servers, and some packages, and a screencast showing how it works.


Today's work was sponsored by Trenton Cronholm on Patreon.

Posted
debug me final polishing

Fixed a tricky race condition that I think was responsible for some recent instability seen in debug-me when connecting to a session. It's a bit hard to tell because it caused at least 3 different types of crashes, and it was a race condition.

Made debug-me prompt for the user's email address when it starts up, and then the server will email the session log to the user at the end. There are two reasons to do this. First, it guards against the developer connecting to the session, doing something bad, and deleting the user's local log file to cover their tracks. Second, it means the server doesn't have to retain old log files, which could be abused to store other data on servers.

Also put together a basic web site, https://debug-me.branchable.com/.

Posted
debug me gpg key verification

Working on making debug-me verify the developer's gpg key. Here's what the user sees when I connect to them:

** debug-me session control and chat window
Someone wants to connect to this debug-me session.
Checking their Gnupg signature ...
gpg: Signature made Sat Apr 29 14:31:37 2017 JEST
gpg:                using RSA key 28A500C35207EAB72F6C0F25DB12DB0FF05F8F38
gpg: Good signature from "Joey Hess <joeyh@joeyh.name>" [unknown]
gpg:                 aka "Joey Hess <id@joeyh.name>" [unknown]
gpg:                 aka "Joey Hess <joey@kitenet.net>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Checking the Gnupg web of trust ...
Joey Hess's identity has been verified by as many as 111 people, including:
Martin Michlmayr, Kurt Gramlich, Luca Capello, Christian Perrier, Axel Beckert,
Stefano Zacchiroli, Gerfried Fuchs, Eduard Bloch, Anibal Monsalve Salazar

Joey Hess is probably a real person.

Let them connect to the debug-me session and run commands? [y/n] 

And here's what the user sees when a fake person connects:

** debug-me session control and chat window
Someone wants to connect to this debug-me session.
Checking their Gnupg signature ...
gpg: Signature made Sat Apr 29 14:47:29 2017 JEST
gpg:                using RSA key
gpg: Good signature from "John Doe" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: B2CF F6EF 2F01 96B1 CD2C  5A03 16A1 2F05 4447 4791
Checking the Gnupg web of trust ...

Their identity cannot be verified!

Let them connect to the debug-me session and run commands? [y/n] 

The debug-me user is likely not be connected to the gpg web of trust, so debug-me will download the developer's key from a keyserver, and uses the https://pgp.cs.uu.nl/ service to check if the developer's key is in the strong set of the web of trust. It prints out the best-connected people who have signed the developer's key, since the user might recognise some of those names.

While relying on a server to determine if the developer is in the strong set is not ideal, it would be no better to have debug-me depend on wotsap, because wotsap still has to download the WoT database. (Also, the version of wotsap in debian is outdated and insecure) The decentralized way is for the user do some key signing, get into the WoT, and then gpg can tell them if the key is trusted itself.


debug-me is now nearly feature-complete!

It has some bugs, and a known problem with the evidence chain that needs to be fixed. And, I want to make debug-me servers email logs back to users, which will change the websockets protocol, so ought to be done before making any release.

Posted
debug me control window

I've been trying to write a good description of debug-me, and here's what I've got so far:

Short description: "secure remote debugging"

Long description:

Debugging a problem over email is slow, tedious, and hard. The developer needs to see the your problem to understand it. Debug-me aims to make debugging fast, fun, and easy, by letting the developer access your computer remotely, so they can immediately see and interact with the problem. Making your problem their problem gets it fixed fast.

A debug-me session is logged and signed with the developer's Gnupg key, producing a chain of evidence of what they saw and what they did. So the developer's good reputation is leveraged to make debug-me secure.

When you start debug-me without any options, it will connect to a debug-me server, and print out an url that you can give to the developer to get them connected to you. Then debug-me will show you their Gnupg key and who has signed it. If the developer has a good reputation, you can proceed to let them type into your console in a debug-me session. Once the session is done, the debug-me server will email you the signed evidence of what the developer did in the session.

If the developer did do something bad, you'd have proof that they cannot be trusted, which you can share with the world. Knowing that is the case will keep most developers honest.

I think that's pretty good, and would like to know your thoughts, reader, as a prospective debug-me user.


Most of today was spent making debug-me --control communicate over a socket with the main debug-me process. That is run in a separate window, which is the debug-me session control and chat window. Things typed there can be seen by the other people involved in a debug-me session. And, the gnupg key prompting stuff will be done in that window eventually.

Screenshot of the first time that worked. The "user" is on the left and the "developer" is on the right.

screenshot.png

Posted
debug me finalizing wire format

Went ahead and made debug-me use protocol buffers for its wire protocol. There's a nice haskell library for this that doesn't depend on anything else, and can generate them directly from data types, but I had to write a shim between the protobuf style data types and debug-me's internal data types. Took 250 lines of quite tedious code.

Then I finally implemented the trick I thought of to leave out the previous hash from debug-me messages on the wire, while still including cryptograhically secure proof of what the previous hash was. That reduced the overhead of a debug-me message from 168 bytes to 74 bytes!

I doubt debug-me's wire format will see any more major changes.

How does debug-me compare with ssh? I tried some experiments, and typing a character in ssh sends 180 bytes over the wire, while doing the same in debug-me sends 326 bytes. The extra overhead must be due to using websockets, I guess. At least debug-me is in the same ballpark as ssh.

Today's work was sponsored by Riku Voipio.

Posted
debug me polishing

Working on polishing up debug-me's features to not just work, but work well.

On Monday, I spent much longer than expected on the problem that when a debug-me session ended, clients attached to it did not shut down. The session shutdown turned out to have worked by accident in one case, but it was lacking a proper implementation, and juggling all the threads and channels and websockets to get everything to shut down cleanly was nontrivial.

Today, I fixed a bug that made debug-me --download fail while downloading a session, because the server didn't relay messages from developers, and so the proof chain was invalid. After making the server relay those messages, and handling them, that was fixed -- and I got a great feature for free: Multiple developers can connect to a debug-me session and all interact with it at the same time!

Also, added timing information to debug-me messages. While time is relative and so it can't be proved how long the time was between messages in the debug-me proof chain, including that information lets debug-me --download download a session and then debug-me --replay can replay the log file with realistic pauses.

Started on gpg signing and signature verification, but that has a user interface problem. If the developer connects after the debug-me session has started, prompting the user on the same terminal that is displaying the session would not be good. This is where it'd be good to have a library for multi-terminal applications. Perhaps I should go build a prototype of that. Of perhaps I'll make debug-me wait for one developer to connect and prompt the user before starting the session.

Posted
debug me client-server working

Got debug-me fully working over the network today. It's allllive!

Hardest thing today was when a developer connects and the server needs to send them the backlog of the session before they start seeing current activity. Potentially full of races. My implementation avoids race conditions, but might cause other connected developers to see a stall in activity at that point. A stall-free version is certianly doable, but this is good enough for now.

There are quite a few bugs to fix. Including a security hole in the proof chain design, that I realized it had when thinking about what happens with multiple people are connected to a debug-me session who are all typing at once.

(There have actually been 3 security holes spotted over the past day; the one above, a lacking sanitization of session IDs, and a bug in the server that let a developer truncate logs.)

So I need to spend several mode days bugfixing, and also make it only allow connections signed by trusted gpg keys. Still, an initial release does not seem far off now.

Posted
debug me websockets

Worked today on making debug-me run as a client/server, communicating using websockets.

I decided to use the "binary" library to get an efficient serialization of debug-me's messages to send over the websockets, rather than using JSON. A typically JSON message was 341 bytes, and this only uses 165 bytes, which is fairly close to the actual data size of ~129 bytes. I may later use protocol buffers to make it less of a haskell-specific wire format.

Currently, the client and server basically work; the client can negotiate a protocol version with the server and send messages to it, which the server logs.


Also, added two additional modes to debug-me. debug-me --download url will download a debug-me log file. If that session is still running, it keeps downloading until it's gotten the whole session. debug-me --watch url connects to a debug-me session, and displays it in non-interactive mode. These were really easy to implement, reusing existing code.

Posted
debug me signatures

Added signatures to the debug-me protocol today. All messages are signed using a ed25519 session key, and the protocol negotiates these keys.

Here's a dump of a debug-me session, including session key exchange:

{"ControlMessage":{"control":{"SessionKey":[{"b64":"it8RIgswI8IZGjjQ+/INPjGYPAcGCwN9WmGZNlMFoX0="},null]},"controlSignature":{"Ed25519Signature":{"b64":"v80m5vQbgw87o88+oApg0syUk/vg88t14nIfXzahwAqEes/mqY4WWFIbMR46WcsEKP2fwfXQEN5/nc6UOagBCQ=="}}}}
{"ActivityMessage":{"prevActivity":null,"activitySignature":{"Ed25519Signature":{"b64":"HNPk/8QF7iVtsI+hHuO1+J9CFnIgsSrqr1ITQ2eQ4VM7rRPG7i07eKKpv/iUwPP4OdloSmoHLWZeMXZNvqnCBQ=="}},"activity":{"seenData":{"v":">>> debug-me session starting\r\n"}}}}
{"ActivityMessage":{"prevActivity":{"hashValue":{"v":"63d31b25ca262d7e9fc5169d137f61ecef20fb65c23c493b1910443d7a5514e4"},"hashMethod":"SHA256"},"activitySignature":{"Ed25519Signature":{"b64":"+E0N7j9MwWgFp+LwdzNyByA5W6UELh6JFxVCU7+ByuhcerVO/SC2ZJJJMq8xqEXSc9rMNKVaAT3Z6JmidF+XAw=="}},"activity":{"seenData":{"v":"$ "}}}}
{"ControlMessage":{"control":{"SessionKey":[{"b64":"dlaIEkybI5j3M/WU97RjcAAr0XsOQQ89ffZULVR82pw="},null]},"controlSignature":{"Ed25519Signature":{"b64":"hlyf7SZ5ZyDrELuTD3ZfPCWCBcFcfG9LP7Zuy+roXwlkFAv2VtpYrFAAcnWSvhloTmYIfqo5LWakITPI0ITtAQ=="}}}}
{"ControlMessage":{"control":{"SessionKeyAccepted":[{"b64":"dlaIEkybI5j3M/WU97RjcAAr0XsOQQ89ffZULVR82pw="},null]},"controlSignature":{"Ed25519Signature":{"b64":"kJ7AdhBgoiYfsXOM//4mcMcU5sL0oyqulKQHUPFo2aYYPJnu5rKUHlfNsfQbGDDrdsl9SocZaacUpm+FoiDCCg=="}}}}
{"ActivityMessage":{"prevActivity":{"hashValue":{"v":"2250d8b902683053b3faf5bdbe3cfa27517d4ede220e4a24c8887ef42ab506e0"},"hashMethod":"SHA256"},"activitySignature":{"Ed25519Signature":{"b64":"hlF7oFhFealsf8+9R0Wj+vzfb3rBJyQjUyy7V0+n3zRLl5EY88XKQzTuhYb/li+WoH/QNjugcRLEBjfSXCKJBQ=="}},"activity":{"echoData":{"v":""},"enteredData":{"v":"l"}}}}

Ed25519 signatures add 64 bytes overhead to each message, on top of the 64 bytes for the hash pointer to the previous message. But, last night I thought of a cunning plan to remove that hash pointer from the wire protocol, while still generating a provable hash chain. Just leave it out of the serialized message, but include it in the data that's signed. debug-me will then just need to try the hashes of recent messages until it finds one for which the signature verifies, and then it will know what the hash pointer is supposed to point to, without it ever having been sent over the wire! Will implement this trick eventually.

Next though, I need to make debug-me communicate over the network.

Posted
debug me first stage complete

Solved that bug I was stuck on yesterday. I had been looking in the code for the developer side for a bug, but that side was fine; the bug was excessive backlog trimming on the user side.

Now I'm fairly happy with how debug-me's activity chains look, and the first stage of developing debug-me is complete. It still doesn't do anything more than the script command, but all the groundwork for the actual networked debug-me is done now. I only have to add signing, verification of gpg key trust, and http client-server to finish debug-me.

(Also, I made debug-me --replay debug-me.log replay the log with realistic delays, like scriptreplay or ttyplay. Only took a page of code to add that feature.)


I'm only "fairly happy" with the activity chains because there is a weird edge case.. At high latency, when typing "qwertyuiop", this happens:

debug-me.log.png

That looks weird, and is somewhat hard to follow in graph form, but it's "correct" as far as debug-me's rules for activity chains go. Due to the lag, the chain forks:

  • It sends "wer" before the "q" echos back
  • It replies to the "q" echo with tyuio" before the "w" echos back.
  • It replies to the "w" echo with "p"
  • Finally, all the delayed echos come in, and it sends a carriage return, resulting in the command being run.

I'd be happier if the forked chain explicitly merged back together, but to do that and add any provable information, the developer would have to wait for all the echos to arrive before sending the carriage return, or something like that, which would make type-ahead worse. So I think I'll leave it like this. Most of the time, latency is not so high, and so this kind of forking doesn't happen much or is much simpler to understand when it does happen.

Posted
debug me chain issues

Working on getting the debug-me proof chain to be the right shape, and be checked at all points for valididity. This graph of a session shows today'ss progress, but also a bug.

debug-me.log.png

At the top, everything is synchronous while "ls" is entered and echoed back. Then, things go asynchronous when " -la" is entered, and the expected echos (in brackets) match up with what really gets echoed, so that input is also accepted.

Finally, the bit in red where "|" is entered is a bug on the developer side, and it gets (correctly) rejected on the user side due to having forked the proof chain. Currently stuck on this bug.

The code for this, especially on the developer side, is rather hairy, I wonder if I am missing a way to simplify it.

Posted
debug me half days

Two days only partially spent on debug-me..

Yesterday a few small improvements, but mostly I discovered the posix-pty library, and converted debug-me to use it rather than wrangling ptys itself. Which was nice because it let me fix resizing. However, the library had a bug with how it initializes the terminal, and investigating and working around that bug used up too much time. Oh well, probably still worth it.


Today, made debug-me serialize to and from JSON.

{"signature":{"v":""},"prevActivity":null,"activity":{"seenData":{"v":">>> debug-me session starting\r\n"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"fb4401a717f86958747d34f98c079eaa811d8af7d22e977d733f1b9e091073a6"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"$ "}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"cfc629125d93f55d2a376ecb9e119c89fe2cc47a63e6bc79588d6e7145cb50d2"},"hashMethod":"SHA256"},"activity":{"echoData":{"v":""},"enteredData":{"v":"l"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"cfc629125d93f55d2a376ecb9e119c89fe2cc47a63e6bc79588d6e7145cb50d2"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"l"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"3a0530c7739418e22f20696bb3798f8c3b2caf7763080f78bfeecc618fc5862e"},"hashMethod":"SHA256"},"activity":{"echoData":{"v":""},"enteredData":{"v":"s"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"3a0530c7739418e22f20696bb3798f8c3b2caf7763080f78bfeecc618fc5862e"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"s"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"91ac86c7dc2445c18e9a0cfa265585b55e01807e377d5f083c90ef307124d8ab"},"hashMethod":"SHA256"},"activity":{"echoData":{"v":""},"enteredData":{"v":"\r"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"91ac86c7dc2445c18e9a0cfa265585b55e01807e377d5f083c90ef307124d8ab"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"\r\n"}}}
{"signature":{"v":""},"prevActivity":{"hashValue":{"v":"cc97177983767a5ab490d63593011161e2bd4ac2fe00195692f965810e6cf3bf"},"hashMethod":"SHA256"},"activity":{"seenData":{"v":"AGPL\t    Pty.hs    Types.hs\t  debug-me.cabal  dist\r\nCmdLine.hs  Setup.hs  Val.hs\t  debug-me.hs\t  stack.yaml\r\n"}}}

That's a pretty verbose way of saying: I typed "ls" and saw the list of files. But it compresses well. Each packet for a single keystroke will take only 37 bytes to transmit as part of a compressed stream of JSON, and 32 of those bytes are needed for the SHA256 hash. So, this is probably good enough to use as debug-me's wire format.

(Some more bytes will be needed once the signature field is not empty..)

It's also a good logging format, and can be easily analized to eg, prove when a person used debug-me to do something bad.

Wrote a quick visualizor for debug-me logs using graphviz. This will be super useful for debug-me development if nothing else.

debug-me.log.png

Posted
debug me day 2

Proceeding as planned, I wrote 170 lines of code to make debug-me have separate threads for the user and developer sides, which send one-another updates to the activity chain, and check them for validity. This was fun to implement! And it's lacking only signing to be a full implementation of the debug-me proof chain.

Then I added a network latency simulation to it and tried different latencies up to the latency I measure on my satellite internet link (800 ms or so)

That helped me find two bugs, where it was not handling echo simulation correctly. Something is still not handled quite right, because when I put a network latency delay before sending output from the user side to the developer side, it causes some developer input to get rejected. So I'm for now only inserting latency when the developer is sending input to the user side. Good enough for proof-of-concept.

Result is that, even with a high latency, it feels "natural" to type commands into debug-me. The echo emulation works, so it accepts typeahead.

Using backspace to delete several letters in a row feels "wrong"; the synchronousness requirements prevent that working when latency is high. Same problem for moving around with the arrow keys. Down around 200 ms latency, these problems are not apparent, unless you mash down the backspace or arrow key.

How about using an editor? It seemed reasonably non-annoying at 200 ms latency, although here I do tend to mash down arrow keys and then it moves too fast for debug-me to keep up, and so the cursor movement stalls.

At higher latencies, using an editor was pretty annoying. Where I might normally press the down arrow key N distinct times to get to the line I wanted, that doesn't work in debug-me at 800 ms latency. Of course, over such a slow connection, using an editor is the last thing you want to do anyway, and vi key combos like 9j start to become necessary (and work in debug-me).

Based on these experiements, the synchronousness requirements are not as utterly annoying as I'd feared, especially at typical latencies.

And, it seems worth making debug-me detect when several keys are pressed close together, and send a single packet over the network combining those. That should make it behave better when mashing down a key.


Today's work was sponsored by Jake Vosloo on Patreon

Posted
debug me day 1

Started some exploratory programming on the debug-me idea.

First, wrote down some data types for debug-me's proof of developer activity.

Then, some terminal wrangling, to get debug-me to allocate a pseudo-terminal, run an interactive shell in it, and pass stdin and stdout back and forth to the terminal it was started in. At this point, debug-me is very similar to script, except it doesn't log the data it intercepts to a typescript file.

Terminals are complicated, so this took a while, and it's still not perfect, but good enough for now. Needs to have resize handling added, and for some reason when the program exits, the terminal is left in a raw state, despite the program apparently resetting its attributes.

Next goal is to check how annoying debug-me's insistence on a synchronous activity proof chain will be when using debug-me across a network link with some latency. If that's too annoying, the design will need to be changed, or perhaps won't work.

To do that, I plan to make debug-me simulate a network between the user and developer's processes, using threads inside a single process for now. The user thread will builds up an activity chain, and only accepts inputs from the developer thread when they meet the synchronicity requirements. Ran out of time to finish that today, so next time.

debug-me's git repository is available from https://git.joeyh.name/index.cgi/debug-me.git/


Today's work was sponsored by andrea rota.

Posted
propellor self bootstrap property

Worked for a while today on http://propellor.branchable.com/todo/property_to_install_propellor/, with the goal of making propellor build a disk image that itself contains propellor.

The hard part of that turned out to be that inside the chroot it's building, /usr/local/propellor is bind mounted to the one outside the chroot. But this new property needs to populate that directory in the chroot. Simply unmounting the bind mount would break later properties, so some way to temporarily expose the underlying directory was called for.

At first, I thought unshare -m could be used to do this, but for some reason that does not work in a chroot. Pity. Ended up going with a complicated dance, where the bind mount is bind mounted to a temp dir, then unmounted to expose the underlying directory, and once it's set up, the temp dir is re-bind-mounted back over it. Ugh.

I was able to reuse Propellor.Bootstrap to bootstrap propellor inside the chroot, which was nice.

Also nice that I'm able to work on this kind of thing at home despite it involving building chroots -- yay for satellite internet!


Today's work was sponsored by Riku Voipio.

Posted
type safe multi-OS Propellor

Propellor was recently ported to FreeBSD, by Evan Cofsky. This new feature led me down a two week long rabbit hole to make it type safe. In particular, Propellor needed to be taught that some properties work on Debian, others on FreeBSD, and others on both.

The user shouldn't need to worry about making a mistake like this; the type checker should tell them they're asking for something that can't fly.

-- Is this a Debian or a FreeBSD host? I can't remember, let's use both package managers!
host "example.com" $ props
    & aptUpgraded
    & pkgUpgraded

As of propellor 3.0.0 (in git now; to be released soon), the type checker will catch such mistakes.

Also, it's really easy to combine two OS-specific properties into a property that supports both OS's:

upgraded = aptUpgraded `pickOS` pkgUpgraded

type level lists and functions

The magick making this work is type-level lists. A property has a metatypes list as part of its type. (So called because it's additional types describing the type, and I couldn't find a better name.) This list can contain one or more OS's targeted by the property:

aptUpgraded :: Property (MetaTypes '[ 'Targeting 'OSDebian, 'Targeting 'OSBuntish ])

pkgUpgraded :: Property (MetaTypes '[ 'Targeting 'OSFreeBSD ])

In Haskell type-level lists and other DataKinds are indicated by the ' if you have not seen that before. There are some convenience aliases and type operators, which let the same types be expressed more cleanly:

aptUpgraded :: Property (Debian + Buntish)

pkgUpgraded :: Property FreeBSD

Whenever two properties are combined, their metatypes are combined using a type-level function. Combining aptUpgraded and pkgUpgraded will yield a metatypes that targets no OS's, since they have none in common. So will fail to type check.

My implementation of the metatypes lists is hundreds of lines of code, consisting entirely of types and type families. It includes a basic implementation of singletons, and is portable back to ghc 7.6 to support Debian stable. While it takes some contortions to support such an old version of ghc, it's pretty awesome that the ghc in Debian stable supports this stuff.

extending beyond targeted OS's

Before this change, Propellor's Property type had already been slightly refined, tagging them with HasInfo or NoInfo, as described in making propellor safer with GADTs and type families. I needed to keep that HasInfo in the type of properties.

But, it seemed unnecessary verbose to have types like Property NoInfo Debian. Especially if I want to add even more information to Property types later. Property NoInfo Debian NoPortsOpen would be a real mouthful to need to write for every property.

Luckily I now have this handy type-level list. So, I can shove more types into it, so Property (HasInfo + Debian) is used where necessary, and Property Debian can be used everywhere else.

Since I can add more types to the type-level list, without affecting other properties, I expect to be able to implement type-level port conflict detection next. Should be fairly easy to do without changing the API except for properties that use ports.

singletons

As shown here, pickOS makes a property that decides which of two properties to use based on the host's OS.

aptUpgraded :: Property DebianLike
aptUpgraded = property "apt upgraded" (apt "upgrade" `requires` apt "update")

pkgUpgraded :: Property FreeBSD
pkgUpgraded = property "pkg upgraded" (pkg "upgrade")
    
upgraded :: Property UnixLike
upgraded = (aptUpgraded `pickOS` pkgUpgraded)
    `describe` "OS upgraded"

Any number of OS's can be chained this way, to build a property that is super-portable out of simple little non-portable properties. This is a sweet combinator!

Singletons are types that are inhabited by a single value. This lets the value be inferred from the type, which came in handy in building the pickOS property combinator.

Its implementation needs to be able to look at each of the properties at runtime, to compare the OS's they target with the actial OS of the host. That's done by stashing a target list value inside a property. The target list value is inferred from the type of the property, thanks to singletons, and so does not need to be passed in to property. That saves keyboard time and avoids mistakes.

is it worth it?

It's important to consider whether more complicated types are a net benefit. Of course, opinions vary widely on that question in general! But let's consider it in light of my main goals for Propellor:

  1. Help save the user from pushing a broken configuration to their machines at a time when they're down in the trenches dealing with some urgent problem at 3 am.
  2. Advance the state of the art in configuration management by taking advantage of the state of the art in strongly typed haskell.

This change definitely meets both criteria. But there is a tradeoff; it got a little bit harder to write new propellor properties. Not only do new properties need to have their type set to target appropriate systems, but the more polymorphic code is, the more likely the type checker can't figure out all the types without some help.

A simple example of this problem is as follows.

foo :: Property UnixLike
foo = p `requires` bar
  where
    p = property "foo" $ do
        ...

The type checker will complain that "The type variable ‘metatypes1’ is ambiguous". Problem is that it can't infer the type of p because many different types could be combined with the bar property and all would yield a Property UnixLike. The solution is simply to add a type signature like p :: Property UnixLike

Since this only affects creating new properties, and not combining existing properties (which have known types), it seems like a reasonable tradeoff.

things to improve later

There are a few warts that I'm willing to live with for now...

Currently, Property (HasInfo + Debian) is different than Property (Debian + HasInfo), but they should really be considered to be the same type. That is, I need type-level sets, not lists. While there's a type level sets library for hackage, it still seems to require a specific order of the set items when writing down a type signature.

Also, using ensureProperty, which runs one property inside the action of another property, got complicated by the need to pass it a type witness.

foo = Property Debian
foo = property' $ \witness -> do
    ensureProperty witness (aptInstall "foo")

That witness is used to type check that the inner property targets every OS that the outer property targets. I think it might be possible to store the witness in the monad, and have ensureProperty read it, but it might complicate the type of the monad too much, since it would have to be parameterized on the type of the witness.

Oh no, I mentioned monads. While type level lists and type functions and generally bending the type checker to my will is all well and good, I know most readers stop reading at "monad". So, I'll stop writing. ;)

thanks

Thanks to David Miani who answered my first tentative question with a big hunk of example code that got me on the right track.

Also to many other people who answered increasingly esoteric Haskell type system questions.

Also thanks to the Shuttleworth foundation, which funded this work by way of a Flash Grant.

letsencrypt support in propellor

I've integrated letsencrypt into propellor today.

I'm using the reference letsencrypt client. While I've seen complaints that it has a lot of dependencies and is too complicated, it seemed to only need to pull in a few packages, and use only a few megabytes of disk space, and it has fewer options than ls does. So seems fine. (Although it would be nice to have some alternatives packaged in Debian.)

I ended up implementing this:

letsEncrypt :: AgreeTOS -> Domain -> WebRoot -> Property NoInfo

This property just makes the certificate available, it does not configure the web server to use it. This avoids relying on the letsencrypt client's apache config munging, which is probably useful for many people, but not those of us using configuration management systems. And so avoids most of the complicated magic that the letsencrypt client has a reputation for.

Instead, any property that wants to use the certificate can just use leteencrypt to get it and set up the server when it makes a change to the certificate:

letsEncrypt (LetsEncrypt.AgreeTOS (Just "me@my.domain")) "example.com" "/var/www"
    `onChange` setupthewebserver

(Took me a while to notice I could use onChange like that, and so divorce the cert generation/renewal from the server setup. onChange is awesome! This blog post has been updated accordingly.)

In practice, the http site has to be brought up first, and then letsencrypt run, and then the cert installed and the https site brought up using it. That dance is automated by this property:

Apache.httpsVirtualHost "example.com" "/var/www"
    (LetsEncrypt.AgreeTOS (Just "me@my.domain"))

That's about as simple a configuration as I can imagine for such a website!


The two parts of letsencrypt that are complicated are not the fault of the client really. Those are renewal and rate limiting.

I'm currently rate limited for the next week because I asked letsencrypt for several certificates for a domain, as I was learning how to use it and integrating it into propellor. So I've not quite managed to fully test everything. That's annoying. I also worry that rate limiting could hit at an inopportune time once I'm relying on letsencrypt. It's especially problimatic that it only allows 5 certs for subdomains of a given domain per week. What if I use a lot of subdomains?

Renewal is complicated mostly because there's no good way to test it. You set up your cron job, or whatever, and wait three months, and hopefully it worked. Just as likely, you got something wrong, and your website breaks. Maybe letsencrypt could offer certificates that will only last an hour, or a day, for use when testing renewal.

Also, what if something goes wrong with renewal? Perhaps letsencrypt.org is not available when your certificate needs to be renewed.

What I've done in propellor to handle renewal is, it runs letsencrypt every time, with the --keep-until-expiring option. If this fails, propellor will report a failure. As long as propellor is run periodically by a cron job, this should result in multiple failure reports being sent (for 30 days I think) before a cert expires without getting renewed. But, I have not been able to test this.

Posted
propelling disk images

Following up on Then and Now ...

In quiet moments at ICFP last August, I finished teaching Propellor to generate disk images. With an emphasis on doing a whole lot with very little new code and extreme amount of code reuse.

For example, let's make a disk image with nethack on it. First, we need to define a chroot. Disk image creation reuses propellor's chroot support, described back in propelling containers. Any propellor properties can be assigned to the chroot, so it's easy to describe the system we want.

 nethackChroot :: FilePath -> Chroot
    nethackChroot d = Chroot.debootstrapped (System (Debian Stable) "amd64") mempty d
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["nethack-console"]
        & accountFor gamer
        & gamer `hasInsecurePassword` "hello"
        & gamer `hasLoginShell` "/usr/games/nethack"
      where gamer = User "gamer"

Now to make an image from that chroot, we just have to tell propellor where to put the image file, some partitioning information, and to make it boot using grub.

 nethackImage :: RevertableProperty
    nethackImage = imageBuilt "/srv/images/nethack.img" nethackChroot
        MSDOS (grubBooted PC)
        [ partition EXT2 `mountedAt` "/boot"
            `setFlag` BootFlag
        , partition EXT4 `mountedAt` "/"
            `addFreeSpace` MegaBytes 100
        , swapPartition (MegaBytes 256)
        ]

The disk image partitions default to being sized to fit exactly the files from the chroot that go into each partition, so, the disk image is as small as possible by default. There's a little DSL to configure the partitions. To give control over the partition size, it has some functions, like addFreeSpace and setSize. Other functions like setFlag and extended can further adjust the partitions. I think that worked out rather well; the partition specification is compact and avoids unecessary hardcoded sizes, while providing plenty of control.

By the end of ICFP, I had Propellor building complete disk images, but no boot loader installed on them.


Fast forward to today. After stuggling with some strange grub behavior, I found a working method to install grub onto a disk image.

The whole disk image feature weighs in at:

203 lines to interface with parted
88 lines to format and mount partitions
90 lines for the partition table specification DSL and partition sizing
196 lines to generate disk images
75 lines to install grub on a disk image
652 lines of code total

Which is about half the size of vmdebootstrap 1/4th the size of partman-base (probably 1/100th the size of total partman), and 1/13th the size of live-build. All of which do similar things, in ways that seem to me to be much less flexible than Propellor.


One thing I'm considering doing is extending this so Propellor can use qemu-user-static to create disk images for eg, arm. Add some u-boot setup, and this could create bootable images for arm boards. A library of configs for various arm boards could then be included in Propellor. This would be a lot easier than running the Debian Installer on an arm board.

Oh! I only just now realized that if you have a propellor host configured, like this example for my dialup gateway, leech --

 leech = host "leech.kitenet.net"
        & os (System (Debian (Stable "jessie")) "armel")
        & Apt.installed ["linux-image-kirkwood", "ppp", "screen", "iftop"]
        & privContent "/etc/ppp/peers/provider"
        & privContent "/etc/ppp/pap-secrets"
        & Ppp.onBoot
        & hasPassword (User "root")
        & Ssh.installed

-- The host's properties can be extracted from it, using eg hostProperties leech and reused to create a disk image with the same properties as the host!

So, when my dialup gateway gets struck by lightning again, I could use this to build a disk image for its replacement:

 import qualified Propellor.Property.Hardware.SheevaPlug as SheevaPlug

    laptop = host "darkstar.kitenet.net"
        & SheevaPlug.diskImage "/srv/images/leech.img" (MegaBytes 2000)
            (& propertyList "has all of leech's properties"
                (hostProperties leech))

This also means you can start with a manually built system, write down the properties it has, and iteratively run Propellor against it until you think you have a full specification of it, and then use that to generate a new, clean disk image. Nice way to transition from sysadmin days of yore to a clean declaratively specified system.

Posted
propellor orchestration

With the disclamer that I don't really know much about orchestration, I have added support for something resembling it to Propellor.

Until now, when using propellor to manage a bunch of hosts, you updated them one at a time by running propellor --spin $somehost, or maybe you set up a central git repository, and a cron job to run propellor on each host, pulling changes from git.

I like both of these ways to use propellor, but they only go so far...

  • Perhaps you have a lot of hosts, and would like to run propellor on them all concurrently.

      master = host "master.example.com"
          & concurrently conducts alotofhosts
    
  • Perhaps you want to run propellor on your dns server last, so when you add a new webserver host, it gets set up and working before the dns is updated to point to it.

      master = host "master.example.com"
          & conducts webservers
              `before` conducts dnsserver
    
  • Perhaps you have something more complex, with multiple subnets that propellor can run in concurrently, finishing up by updating that dnsserver.

      master = host "master.example.com"
          & concurrently conducts [sub1, sub2]
              `before` conducts dnsserver
    
      sub1 = "master.subnet1.example.com"
          & concurrently conducts webservers
          & conducts loadbalancers
    
      sub2 = "master.subnet2.example.com"
          & conducts dockerservers
    
  • Perhaps you need to first run some command that creates a VPS host, and then want to run propellor on that host to set it up.

      vpscreate h = cmdProperty "vpscreate" [hostName h]
          `before` conducts h
    

All those scenarios are supported by propellor now!

Well, I haven't actually implemented concurrently yet, but the point is that the conducts property can be used with any of propellor's property combinators, like before etc, to express all kinds of scenarios.

The conducts property works in combination with an orchestrate function to set up all the necessary stuff to let one host ssh into another and run propellor there.

main = defaultMain (orchestrate hosts)

hosts = 
    [ master
    , webservers 
    , ...
    ]

The orchestrate function does a bunch of stuff:

  • Builds up a graph of what conducts what.
  • Removes any cycles that might have snuck in by accident, before they cause foot shooting.
  • Arranges for the ssh keys to be accepted as necessary.
    Note that you you need to add ssh key properties to all relevant hosts so it knows what keys to trust.
  • Arranges for the private data of a host to be provided to the hosts that conduct it, so they can pass it along.

I've very pleased that I was able to add the Propellor.Property.Conductor module implementing this with only a tiny change to the rest of propellor. Almost everything needed to implement it was there in propellor's infrastructure already.

Also kind of cool that it only needed 13 lines of imperative code, the other several hundred lines of the implementation being all pure code.

Posted
it's a bird, it's a plane, it's a super monoid for propellor

I've been doing a little bit of dynamically typed programming in Haskell, to improve Propellor's Info type. The result is kind of interesting in a scary way.

Info started out as a big record type, containing all the different sorts of metadata that Propellor needed to keep track of. Host IP addresses, DNS entries, ssh public keys, docker image configuration parameters... This got quite out of hand. Info needed to have its hands in everything, even types that should have been private to their module.

To fix that, recent versions of Propellor let a single Info contain many different types of values. Look at it one way and it contains DNS entries; look at it another way and it contains ssh public keys, etc.

As an émigré from lands where you can never know what type of value is in a $foo until you look, this was a scary prospect at first, but I found it's possible to have the benefits of dynamic types and the safety of static types too.

The key to doing it is Data.Dynamic. Thanks to Joachim Breitner for suggesting I could use it here. What I arrived at is this type (slightly simplified):

newtype Info = Info [Dynamic]
    deriving (Monoid)

So Info is a monoid, and it holds of a bunch of dynamic values, which could each be of any type at all. Eep!

So far, this is utterly scary to me. To tame it, the Info constructor is not exported, and so the only way to create an Info is to start with mempty and use this function:

addInfo :: (IsInfo v, Monoid v) => Info -> v -> Info
addInfo (Info l) v = Info (toDyn v : l)

The important part of that is that only allows adding values that are in the IsInfo type class. That prevents the foot shooting associated with dynamic types, by only allowing use of types that make sense as Info. Otherwise arbitrary Strings etc could be passed to addInfo by accident, and all get concated together, and that would be a total dynamic programming mess.

Anything you can add into an Info, you can get back out:

getInfo :: (IsInfo v, Monoid v) => Info -> v
getInfo (Info l) = mconcat (mapMaybe fromDynamic (reverse l))

Only monoids can be stored in Info, so if you ask for a type that an Info doesn't contain, you'll get back mempty.

Crucially, IsInfo is an open type class. Any module in Propellor can make a new data type and make it an instance of IsInfo, and then that new data type can be stored in the Info of a Property, and any Host that uses the Property will have that added to its Info, available for later introspection.


For example, this weekend I'm extending Propellor to have controllers: Hosts that are responsible for running Propellor on some other hosts. Useful if you want to run propellor once and have it update the configuration of an entire network of hosts.

There can be whole chains of controllers controlling other controllers etc. The problem is, what if host foo has the property controllerFor bar and host bar has the property controllerFor foo? I want to avoid a loop of foo running Propellor on bar, running Propellor on foo, ...

To detect such loops, each Host's Info should contain a list of the Hosts it's controlling. Which is not hard to accomplish:

newtype Controlling = Controlled [Host]
    deriving (Typeable, Monoid)

isControlledBy :: Host -> Controlling -> Bool
h `isControlledBy` (Controlled hs) = any (== hostName h) (map hostName hs)

instance IsInfo Controlling where
    propigateInfo _ = True

mkControllingInfo :: Host -> Info
mkControllingInfo controlled = addInfo mempty (Controlled [controlled])

getControlledBy :: Host -> Controlling
getControlledBy = getInfo . hostInfo

isControllerLoop :: Host -> Host -> Bool
isControllerLoop controller controlled = go S.empty controlled
  where
    go checked h
        | controller `isControlledBy` c = True
        -- avoid checking loops that have been checked before
        | hostName h `S.member` checked = False
        | otherwise = any (go (S.insert (hostName h) checked)) l
      where
        c@(Controlled l) = getControlledBy h

This is all internal to the module that needs it; the rest of propellor doesn't need to know that the Info is using used for this. And yet, the necessary information about Hosts is gathered as propellor runs.


So, that's a useful technique. I do wonder if I could somehow make addInfo combine together values in the list that have the same type; as it is the list can get long. And, to show Info, the best I could do was this:

 instance Show Info where
            show (Info l) = "Info " ++ show (map dynTypeRep l)

The resulting long list of the types of vales stored in a host's info is not a useful as it could be. Of course, getInfo can be used to get any particular type of value:

*Main> hostInfo kite
Info [InfoVal System,PrivInfo,PrivInfo,Controlling,DnsInfo,DnsInfo,DnsInfo,AliasesInfo, ...
*Main> getInfo (hostInfo kite) :: AliasesInfo
AliasesInfo (fromList ["downloads.kitenet.net","git.joeyh.name","imap.kitenet.net","nntp.olduse.net" ...

And finally, I keep trying to think of a better name than "Info".

then and now

It's 2004 and I'm in Oldenburg DE, working on the Debian Installer. Colin and I pair program on partman, its new partitioner, to get it into shape. We've somewhat reluctantly decided to use it. Partman is in some ways a beautful piece of work, a mass of semi-object-oriented, super extensible shell code that sprang fully formed from the brow of Anton. And in many ways, it's mad, full of sector alignment twiddling math implemented in tens of thousands of lines of shell script scattered amoung hundreds of tiny files that are impossible to keep straight. In the tiny Oldenburg Developers Meeting, full of obscure hardware and crazy intensity of ideas like porting Debian to VAXen, we hack late into the night, night after night, and crash on the floor.

sepia toned hackers round a table

It's 2015 and I'm at a Chinese bakery, then at the Berkeley pier, then in a SF food truck lot, catching half an hour here and there in my vacation to add some features to Propellor. Mostly writing down data types for things like filesystem formats, partition layouts, and then some small amount of haskell code to use them in generic ways. Putting these peices together and reusing stuff already in Propellor (like chroot creation).

Before long I have this, which is only 2 undefined functions away from (probably) working:

let chroot d = Chroot.debootstrapped (System (Debian Unstable) "amd64") mempty d
        & Apt.installed ["openssh-server"]
        & ...
    partitions = fitChrootSize MSDOS
        [ (Just "/boot", mkPartiton EXT2)
        , (Just "/", mkPartition EXT4)
        , (Nothing, const (mkPartition LinuxSwap (MegaBytes 256)))
        ]
 in Diskimage.built chroot partitions (grubBooted PC)

This is at least a replication of vmdebootstrap, generating a bootable disk image from that config and 400 lines of code, with enormous customizability of the disk image contents, using all the abilities of Propellor. But is also, effectively, a replication of everything partman is used for (aside from UI and RAID/LVM).

sailboat on the SF bay

What a difference a decade and better choices of architecture make! In many ways, this is the loosely coupled, extensible, highly configurable system partman aspired to be. Plus elegance. And I'm writing it on a lark, because I have some spare half hours in my vacation.

Past Debian Installer team lead Tollef stops by for lunch, I show him the code, and we have the conversation old d-i developers always have about partman.

I can't say that partman was a failure, because it's been used by millions to install Debian and Ubuntu and etc for a decade. Anything that deletes that many Windows partitions is a success. But it's been an unhappy success. Nobody has ever had a good time writing partman recipes; the code has grown duplication and unmaintainability.

I can't say that these extensions to Propellor will be a success; there's no plan here to replace Debian Installer (although with a few hundred more lines of code, propellor is d-i 2.0); indeed I'm just adding generic useful stuff and building further stuff out of it without any particular end goal. Perhaps that's the real difference.

Posted
making propellor safer with GADTs and type families

Since July, I have been aware of an ugly problem with propellor. Certain propellor configurations could have a bug. I've tried to solve the problem at least a half-dozen times without success; it's eaten several weekends.

Today I finally managed to fix propellor so it's impossible to write code that has the bug, bending the Haskell type checker to my will with the power of GADTs and type-level functions.

the bug

Code with the bug looked innocuous enough. Something like this:

foo :: Property
foo = property "foo" $
    unlessM (liftIO $ doesFileExist "/etc/foo") $ do
        bar <- liftIO $ readFile "/etc/foo.template"
        ensureProperty $ setupFoo bar

The problem comes about because some properties in propellor have Info associated with them. This is used by propellor to introspect over the properties of a host, and do things like set up DNS, or decrypt private data used by the property.

At the same time, it's useful to let a Property internally decide to run some other Property. In the example above, that's the ensureProperty line, and the setupFoo Property is run only sometimes, and is passed data that is read from the filesystem.

This makes it very hard, indeed probably impossible for Propellor to look inside the monad, realize that setupFoo is being used, and add its Info to the host.

Probably, setupFoo doesn't have Info associated with it -- most properties do not. But, it's hard to tell, when writing such a Property if it's safe to use ensureProperty. And worse, setupFoo could later be changed to have Info.

Now, in most languages, once this problem was noticed, the solution would probably be to make ensureProperty notice when it's called on a Property that has Info, and print a warning message. That's Good Enough in a sense.

But it also really stinks as a solution. It means that building propellor isn't good enough to know you have a working system; you have to let it run on each host, and watch out for warnings. Ugh, no!

the solution

This screams for GADTs. (Well, it did once I learned how what GADTs are and what they can do.)

With GADTs, Property NoInfo and Property HasInfo can be separate data types. Most functions will work on either type (Property i) but ensureProperty can be limited to only accept a Property NoInfo.

data Property i where
    IProperty :: Desc -> ... -> Info -> Property HasInfo
    SProperty :: Desc -> ... -> Property NoInfo

data HasInfo
data NoInfo

ensureProperty :: Property NoInfo -> Propellor Result

Then the type checker can detect the bug, and refuse to compile it.

Yay!

Except ...

Property combinators

There are a lot of Property combinators in propellor. These combine two or more properties in various ways. The most basic one is requires, which only runs the first Property after the second one has successfully been met.

So, what's it's type when used with GADT Property?

requires :: Property i1 -> Property i2 -> Property ???

It seemed I needed some kind of type class, to vary the return type.

class Combine x y r where
    requires :: x -> y -> r

Now I was able to write 4 instances of Combines, for each combination of 2 Properties with HasInfo or NoInfo.

It type checked. But, type inference was busted. A simple expression like

foo `requires` bar

blew up:

   No instance for (Requires (Property HasInfo) (Property HasInfo) r0)
      arising from a use of `requires'
    The type variable `r0' is ambiguous
    Possible fix: add a type signature that fixes these type variable(s)
    Note: there is a potential instance available:
      instance Requires
                 (Property HasInfo) (Property HasInfo) (Property HasInfo)
        -- Defined at Propellor/Types.hs:167:10

To avoid that, it needed ":: Property HasInfo" appended -- I didn't want the user to need to write that.

I got stuck here for an long time, well over a month.

type level programming

Finally today I realized that I could fix this with a little type-level programming.

class Combine x y where
    requires :: x -> y -> CombinedType x y

Here CombinedType is a type-level function, that calculates the type that should be used for a combination of types x and y. This turns out to be really easy to do, once you get your head around type level functions.

type family CInfo x y
type instance CInfo HasInfo HasInfo = HasInfo
type instance CInfo HasInfo NoInfo = HasInfo
type instance CInfo NoInfo HasInfo = HasInfo
type instance CInfo NoInfo NoInfo = NoInfo
type family CombinedType x y
type instance CombinedType (Property x) (Property y) = Property (CInfo x y)

And, with that change, type inference worked again! \o/

(Bonus: I added some more intances of CombinedType for combining things like RevertableProperties, so propellor's property combinators got more powerful too.)

Then I just had to make a massive pass over all of Propellor, fixing the types of each Property to be Property NoInfo or Property HasInfo. I frequently picked the wrong one, but the type checker was able to detect and tell me when I did.

A few of the type signatures got slightly complicated, to provide the type checker with sufficient proof to do its thing...

before :: (IsProp x, Combines y x, IsProp (CombinedType y x)) => x -> y -> CombinedType y x
before x y = (y `requires` x) `describe` (propertyDesc x)

onChange
    :: (Combines (Property x) (Property y))
    => Property x
    => Property y
    => CombinedType (Property x) (Property y)
onChange = -- 6 lines of code omitted

fallback :: (Combines (Property p1) (Property p2)) => Property p1 -> Property p2 -> Property (CInfo p1 p2)
fallback = -- 4 lines of code omitted

.. This mostly happened in property combinators, which is an acceptable tradeoff, when you consider that the type checker is now being used to prove that propellor can't have this bug.

Mostly, things went just fine. The only other annoying thing was that some things use a [Property], and since a haskell list can only contain a single type, while Property Info and Property NoInfo are two different types, that needed to be dealt with. Happily, I was able to extend propellor's existing (&) and (!) operators to work in this situation, so a list can be constructed of properties of several different types:

propertyList "foos" $ props
    & foo
    & foobar
    ! oldfoo    

conclusion

The resulting 4000 lines of changes will be in the next release of propellor. Just as soon as I test that it always generates the same Info as before, and perhaps works when I run it. (eep)

These uses of GADTs and type families are not new; this is merely the first time I used them. It's another Haskell leveling up for me.

Anytime you can identify a class of bugs that can impact a complicated code base, and rework the code base to completely avoid that class of bugs, is a time to celebrate!

clean OS reinstalls with propellor

You have a machine someplace, probably in The Cloud, and it has Linux installed, but not to your liking. You want to do a clean reinstall, maybe switching the distribution, or getting rid of the cruft. But this requires running an installer, and it's too difficult to run d-i on remote machines.

Wouldn't it be nice if you could point a program at that machine and have it do a reinstall, on the fly, while the machine was running?

This is what I've now taught propellor to do! Here's a working configuration which will make propellor convert a system running Fedora (or probably many other Linux distros) to Debian:

testvm :: Host
testvm = host "testvm.kitenet.net"
        & os (System (Debian Unstable) "amd64")
        & OS.cleanInstallOnce (OS.Confirmed "testvm.kitenet.net")
                `onChange` propertyList "fixing up after clean install"
                        [ User.shadowConfig True
                        , OS.preserveRootSshAuthorized
                        , OS.preserveResolvConf
                        , Apt.update
                        , Grub.boots "/dev/sda"
                                `requires` Grub.installed Grub.PC
                        ]
        & Hostname.sane
        & Hostname.searchDomain
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["ssh"]
        & User.hasSomePassword "root"

And here's a video of it in action.


It was surprisingly easy to build this. Propellor already knew how to create a chroot, so from there it basically just has to move files around until the chroot takes over from the old OS.

After the cleanInstallOnce property does its thing, propellor is running inside a freshly debootstrapped Debian system. Then we just need a few more Propertites to get from there to a bootable, usable system: Install grub and the kernel, turn on shadow passwords, preserve a few config files from the old OS, etc.

It's really astounding to me how much easier this was to build than it was to build d-i. It took years to get d-i to the point of being able to install a working system. It took me a few part days to add this capability to propellor (It's 200 lines of code), and I've probably spent a total of less than 30 days total developing propellor in its entirity.

So, what gives? Why is this so much easier? There are a lot of reasons:

  • Technology is so much better now. I can spin up cloud VMs for testing in seconds; I use VirtualBox to restore a system from a snapshot. So testing is much much easier. The first work on d-i was done by booting real machines, and for a while I was booting them using floppies.

  • Propellor doesn't have a user interface. The best part of d-i is preseeding, but that was mostly an accident; when I started developing d-i the first thing I wrote was main-menu (which is invisible 99.9% of the time) and we had to develop cdebconf, and tons of other UI. Probably 90% of d-i work involves the UI. Jettisoning the UI entirely thus speeds up development enormously. And propellor's configuration file blows d-i preseeding out of the water in expressiveness and flexability.

  • Propellor has a much more principled design and implementation. Separating things into Properties, which are composable and reusable gives enormous leverage. Strong type checking and a powerful programming language make it much easier to develop than d-i's mess of shell scripts calling underpowered busybox commands etc. Properties often Just Work the first time they're tested.

  • No separate runtime. d-i runs in its own environment, which is really a little custom linux distribution. Developing linux distributions is hard. Propellor drops into a live system and runs there. So I don't need to worry about booting up the system, getting it on the network, etc etc. This probably removes another order of magnitude of complexity from propellor as compared with d-i.

This seems like the opposite of the Second System effect to me. So perhaps d-i was the second system all along?

I don't know if I'm going to take this all the way to propellor is d-i 2.0. But in theory, all that's needed now is:

  • Teaching propellor how to build a bootable image, containing a live Debian system and propellor. (Yes, this would mean reimplementing debian-live, but I estimate 100 lines of code to do it in propellor; most of the Properties needed already exist.) That image would then be booted up and perform the installation.
  • Some kind of UI that generates the propellor config file.
  • Adding Properties to partition the disk.

cleanInstallOnce and associated Properties will be included in propellor's upcoming 1.1.0 release, and are available in git now.

Oh BTW, you could parameterize a few Properties by OS, and Propellor could be used to install not just Debian or Ubuntu, but whatever Linux distribution you want. Patches welcomed...

propelling containers

Propellor has supported docker containers for a "long" time, and it works great. This week I've worked on adding more container support.

docker containers (revisited)

The syntax for docker containers has changed slightly. Here's how it looks now:

example :: Host
example = host "example.com"
    & Docker.docked webserverContainer

webserverContainer :: Docker.Container
webserverContainer = Docker.container "webserver" (Docker.latestImage "joeyh/debian-stable")
    & os (System (Debian (Stable "wheezy")) "amd64")
    & Docker.publish "80:80"
    & Apt.serviceInstalledRunning "apache2"
    & alias "www.example.com"

That makes example.com have a web server in a docker container, as you'd expect, and when propellor is used to deploy the DNS server it'll automatically make www.example.com point to the host (or hosts!) where this container is docked.

I use docker a lot, but I have drank little of the Docker KoolAid. I'm not keen on using random blobs created by random third parties using either unreproducible methods, or the weirdly underpowered dockerfiles. (As for vast complicated collections of containers that each run one program and talk to one another etc ... I'll wait and see.)

That's why propellor runs inside the docker container and deploys whatever configuration I tell it to, in a way that's both replicatable later and lets me use the full power of Haskell.

Which turns out to be useful when moving on from docker containers to something else...

systemd-nspawn containers

Propellor now supports containers using systemd-nspawn. It looks a lot like the docker example.

example :: Host
example = host "example.com"
    & Systemd.persistentJournal
    & Systemd.nspawned webserverContainer

webserverContainer :: Systemd.Container
webserverContainer = Systemd.container "webserver" chroot
    & Apt.serviceInstalledRunning "apache2"
    & alias "www.example.com"
  where
    chroot = Chroot.debootstrapped (System (Debian Unstable) "amd64") Debootstrap.MinBase

Notice how I specified the Debian Unstable chroot that forms the basis of this container. Propellor sets up the container by running debootstrap, boots it up using systemd-nspawn, and then runs inside the container to provision it.

Unlike docker containers, systemd-nspawn containers use systemd as their init, and it all integrates rather beautifully. You can see the container listed in systemctl status, including the services running inside it, use journalctl to examine its logs, etc.

But no, systemd is the devil, and docker is too trendy...

chroots

Propellor now also supports deploying good old chroots. It looks a lot like the other containers. Rather than repeat myself a third time, and because we don't really run webservers inside chroots much, here's a slightly different example.

example :: Host
example = host "mylaptop"
    & Chroot.provisioned (buildDepChroot "git-annex")

buildDepChroot :: Apt.Package -> Chroot.Chroot
buildDepChroot pkg = Chroot.debootstrapped system Debootstrap.BuildD dir
    & Apt.buildDep pkg
  where
    dir = /srv/chroot/builddep/"++pkg
   system = System (Debian Unstable) "amd64"

Again this uses debootstrap to build the chroot, and then it runs propellor inside the chroot to provision it (btw without bothering to install propellor there, thanks to the magic of bind mounts and completely linux distribution-independent packaging).

In fact, the systemd-nspawn container code reuses the chroot code, and so turns out to be really rather simple. 132 lines for the chroot support, and 167 lines for the systemd support (which goes somewhat beyond the nspawn containers shown above).

Which leads to the hardest part of all this...

debootstrap

Making a propellor property for debootstrap should be easy. And it was, for Debian systems. However, I have crazy plans that involve running propellor on non-Debian systems, to debootstrap something, and installing debootstrap on an arbitrary linux system is ... too hard.

In the end, I needed 253 lines of code to do it, which is barely one magnitude less code than the size of debootstrap itself. I won't go into the ugly details, but this could be made a lot easier if debootstrap catered more to being used outside of Debian.

closing

Docker and systemd-nspawn have different strengths and weaknesses, and there are sure to be more container systems to come. I'm pleased that Propellor can add support for a new container system in a few hundred lines of code, and that it abstracts away all the unimportant differences between these systems.

PS

Seems likely that systemd-nspawn containers can be nested to any depth. So, here's a new kind of fork bomb!

infinitelyNestedContainer :: Systemd.Container
infinitelyNestedContainer = Systemd.container "evil-systemd"
    (Chroot.debootstrapped (System (Debian Unstable) "amd64") Debootstrap.MinBase)
    & Systemd.nspawned infinitelyNestedContainer

Strongly typed purely functional container deployment can only protect us against a certian subset of all badly thought out systems. ;)


Note that the above was written in 2014 and some syntatix details have changed. See the documentation for Propellor.Property.Chroot, Propellor.Property.Debootstrap, Propellor.Property.Docker, Propellor.Property.Systemd for current examples.

using a debian package as the remote for a local config repo

Today I did something interesting with the Debian packaging for propellor, which seems like it could be a useful technique for other Debian packages as well.

Propellor is configured by a directory, which is maintained as a local git repository. In propellor's case, it's ~/.propellor/. This contains a lot of haskell files, in fact the entire source code of propellor! That's really unusual, but I think this can be generalized to any package whose configuration is maintained in its own git repository on the user's system. For now on, I'll refer to this as the config repo.

The config repo is set up the first time a user runs propellor. But, until now, I didn't provide an easy way to update the config repo when the propellor package was updated. Nothing would break, but the old version would be used until the user updated it themselves somehow (probably by pulling from a git repository over the network, bypassing apt's signature validation).

So, what I wanted was a way to update the config repo, merging in any changes from the new version of the Debian package, while preserving the user's local modifications. Ideally, the user could just run git merge upstream/master, where the upstream repo was included in the Debian package.

But, that can't work! The Debian package can't reasonably include the full git repository of propellor with all its history. So, any git repository included in the Debian binary package would need to be a synthetic one, that only contains probably one commit that is not connected to anything else. Which means that if the config repo was cloned from that repo in version 1.0, then when version 1.1 came around, git would see no common parent when merging 1.1 into the config repo, and the merge would fail horribly.

To solve this, let's assume that the config repo's master branch has a parent commit that can be identified, somehow, as coming from a past version of the Debian package. It doesn't matter which version, although the last one merged with will be best. (The easy way to do this is to set refs/heads/upstream/master to point to it when creating the config repo.)

Once we have that parent commit, we have three things:

  1. The current content of the config repo.
  2. The content from some old version of the Debian package.
  3. The new content of the Debian package.

Now git can be used to merge #3 onto #2, with -Xtheirs, so the result is a git commit with parents of #3 and #2, and content of #3. (This can be done using a temporary clone of the config repo to avoid touching its contents.)

Such a git commit can be merged into the config repo, without any conflicts other than those the user might have caused with their own edits.

So, propellor will tell the user when updates are available, and they can simply run git merge upstream/master to get them. The resulting history looks like this:

* Merge remote-tracking branch 'upstream/master'
|\  
| * merging upstream version
| |\  
| | * upstream version
* | user change
|/  
* upstream version

So, generalizing this, if a package has a lot of config files, and creates a git repository containing them when the user uses it (or automatically when it's installed), this method can be used to provide an easily mergable branch that tracks the files as distributed with the package.

It would perhaps not be hard to get from here to a full git-backed version of ucf. Note that the Debian binary package doesn't have to ship a git repisitory, it can just as easily ship the current version of the config files somewhere in /usr, and check them into a new empty repository as part of the generation of the upstream/master branch.

Posted
how I wrote init by accident

I wrote my own init. I didn't mean to, and in the end, it took 2 lines of code. Here's how.

Propellor has the nice feature of supporting provisioning of Docker containers. Since Docker normally runs just one command inside the container, I made the command that docker runs be propellor, which runs inside the container and takes care of provisioning it according to its configuration.

For example, here's a real live configuration of a container:

        -- Exhibit: kite's 90's website.
        , standardContainer "ancient-kitenet" Stable "amd64"
                & Docker.publish "1994:80"
                & Apt.serviceInstalledRunning "apache2"
                & Git.cloned "root" "git://kitenet-net.branchable.com/" "/var/www"
                        (Just "remotes/origin/old-kitenet.net")

When propellor is run inside this container, it takes care of installing apache, and since the property states apache should be running, it also starts the daemon if necessary.

At boot, docker remembers the command that was used to start the container last time, and runs it again. This time, apache is already installed, so propellor simply starts the daemon.

This was surprising, but it was just what I wanted too! The only missing bit to make this otherwise entirely free implementation of init work properly was two lines of code:

                void $ async $ job reapzombies
  where
        reapzombies = void $ getAnyProcessStatus True False

Propellor-as-init also starts up a simple equalivilant of rsh on a named pipe (for communication between the propellor inside and outside the container), and also runs a root login shell (so the user can attach to the container and administer it). Also, running a compiled program from the host system inside a container, which might use a different distribution or architecture was an interesting challenge (solved using the method described in completely linux distribution-independent packaging). So it wasn't entirely trivial, but as far as init goes, it's probably one of the simpler implementations out there.

I know that there are various other solutions on the space of an init for Docker -- personally I'd rather the host's systemd integrated with it so I could see the status of the container's daemons in systemctl status. If that does happen, perhaps I'll eventually be able to remove 2 lines of code from propellor.

propellor-driven DNS and backups

Took a while to get here, but Propellor 0.4.0 can deploy DNS servers and I just had it deploy mine. Including generating DNS zone files.

Configuration is dead simple, as far as DNS goes:

     & alias "ns1.example.com"
        & Dns.secondary hosts "joeyh.name"
                & Dns.primary hosts "example.com"
                        (Dns.mkSOA "ns1.example.com" 100)
                        [ (RootDomain, NS $ AbsDomain "ns1.example.com")
            , (RootDomain, NS $ AbsDomain "ns2.example.com")
                        ]

The awesome thing is that propellor fills in all the other information in the zone file by looking at the properties of the hosts it knows about.

 , host "blue.example.com"
        & ipv4 "192.168.1.1"
        & ipv6 "fe80::26fd:52ff:feea:2294"

        & alias "example.com"
        & alias "www.example.com"
        & alias "example.museum"
        & Docker.docked hosts "webserver"
            `requres` backedup "/var/www"
        
        & alias "ns2.example.com"
        & Dns.secondary hosts "example.com"

When it sees this host, Propellor adds its IP addresses to the example.com DNS zone file, for both its main hostname ("blue.example.com"), and also its relevant aliases. (The .museum alias would go into a different zone file.)

Multiple hosts can define the same alias, and then you automaticlly get round-robin DNS.

The web server part of of the blue.example.com config can be cut and pasted to another host in order to move its web server to the other host, including updating the DNS. That's really all there is to is, just cut, paste, and commit!

I'm quite happy with how that worked out. And curious if Puppet etc have anything similar.


One tricky part of this was how to ensure that the serial number automtically updates when changes are made. The way this is handled is Propellor starts with a base serial number (100 in the example above), and then it adds to it the number of commits in its git repository. The zone file is only updated when something in it besides the serial number needs to change.

The result is nice small serial numbers that don't risk overflowing the (so 90's) 32 bit limit, and will be consistent even if the configuration had Propellor setting up multiple independent master DNS servers for the same domain.


Another recent feature in Propellor is that it can use Obnam to back up a directory. With the awesome feature that if the backed up directory is empty/missing, Propellor will automcatically restore it from the backup.

Here's how the backedup property used in the example above might be implemented:

backedup :: FilePath -> Property
backedup dir = Obnam.backup dir daily
    [ "--repository=sftp://rsync.example.com/~/webserver.obnam"
    ] Obnam.OnlyClient
    `requires` Ssh.keyImported SshRsa "root"
    `requires` Ssh.knownHost hosts "rsync.example.com" "root"
    `requires` Gpg.keyImported "1B169BE1" "root"

Notice that the Ssh.knownHost makes root trust the ssh host key belonging to rsync.example.com. So Propellor needs to be told what that host key is, like so:

 , host "rsync.example.com"
        & ipv4 "192.168.1.4"
        & sshPubKey "ssh-rsa blahblahblah"

Which of course ties back into the DNS and gets this hostname set in it. But also, the ssh public key is available for this host and visible to the DNS zone file generator, and that could also be set in the DNS, in a SSHFP record. I haven't gotten around to implementing that, but hope at some point to make Propellor support DNSSEC, and then this will all combine even more nicely.


By the way, Propellor is now up to 3 thousand lines of code (not including Utility library). In 20 days, as a 10% time side project.

Posted
propellor introspection for DNS

In just released Propellor 0.3.0, I've improved improved Propellor's config file DSL significantly. Now properties can set attributes of a host, that can be looked up by its other properties, using a Reader monad.

This saves needing to repeat yourself:

hosts = [ host "orca.kitenet.net"
        & stdSourcesList Unstable
        & Hostname.sane -- uses hostname from above

And it simplifies docker setup, with no longer a need to differentiate between properties that configure docker vs properties of the container:

 -- A generic webserver in a Docker container.
    , Docker.container "webserver" "joeyh/debian-unstable"
        & Docker.publish "80:80"
        & Docker.volume "/var/www:/var/www"
        & Apt.serviceInstalledRunning "apache2"

But the really useful thing is, it allows automating DNS zone file creation, using attributes of hosts that are set and used alongside their other properties:

hosts =
    [ host "clam.kitenet.net"
        & ipv4 "10.1.1.1"

        & cname "openid.kitenet.net"
        & Docker.docked hosts "openid-provider"

        & cname "ancient.kitenet.net"
        & Docker.docked hosts "ancient-kitenet"
    , host "diatom.kitenet.net"
        & Dns.primary "kitenet.net" hosts
    ]

Notice that hosts is passed into Dns.primary, inside the definition of hosts! Tying the knot like this is a fun haskell laziness trick. :)

Now I just need to write a little function to look over the hosts and generate a zone file from their hostname, cname, and address attributes:

extractZoneFile :: Domain -> [Host] -> ZoneFile
extractZoneFile = gen . map hostAttr
  where gen = -- TODO

The eventual plan is that the cname property won't be defined as a property of the host, but of the container running inside it. Then I'll be able to cut-n-paste move docker containers between hosts, or duplicate the same container onto several hosts to deal with load, and propellor will provision them, and update the zone file appropriately.


Also, Chris Webber had suggested that Propellor be able to separate values from properties, so that eg, a web wizard could configure the values easily. I think this gets it much of the way there. All that's left to do is two easy functions:

overrideAttrsFromJSON :: Host -> JSON -> Host

exportJSONAttrs :: Host -> JSON

With these, propellor's configuration could be adjusted at run time using JSON from a file or other source. For example, here's a containerized webserver that publishes a directory from the external host, as configured by JSON that it exports:

demo :: Host
demo = Docker.container "webserver" "joeyh/debian-unstable"
    & Docker.publish "80:80"
    & dir_to_publish "/home/mywebsite" -- dummy default
    & Docker.volume (getAttr dir_to_publish ++":/var/www")
    & Apt.serviceInstalledRunning "apache2"

main = do
    json <- readJSON "my.json"
    let demo' = overrideAttrsFromJSON demo
    writeJSON "my.json" (exportJSONAttrs demo')
    defaultMain [demo']
propellor type-safe reversions

Propellor ensures that a list of properties about a system are satisfied. But requirements change, and so you might want to revert a property that had been set up before.

For example, I had a system with a webserver container:

Docker.docked container hostname "webserver"

I don't want a web server there any more. Rather than having a separate property to stop it, wouldn't it be nice to be able to say:

revert (Docker.docked container hostname "webserver")

I've now gotten this working. The really fun part is, some properies support reversion, but other properties certianly do not. Maybe the code to revert them is not worth writing, or maybe the property does something that cannot be reverted.

For example, Docker.garbageCollected is a property that makes sure there are no unused docker images wasting disk space. It can't be reverted. Nor can my personal standardSystem Unstable property, which amoung other things upgrades the system to unstable and sets up my home directory..

I found a way to make Propellor statically check if a property can be reverted at compile time. So revert Docker.garbageCollected will fail to type check!

The tricky part about implementing this is that the user configures Propellor with a list of properties. But now there are two distinct types of properties, revertable ones and non-revertable ones. And Haskell does not support heterogeneous lists..

My solution to this is a typeclass and some syntactic sugar operators. To build a list of properties, with individual elements that might be revertable, and others not:

 props
        & standardSystem Unstable
        & revert (Docker.docked container hostname "webserver")
        & Docker.docked container hostname "amd64-git-annex-builder"
        & Docker.garbageCollected
Posted
adding docker support to propellor

Propellor development is churning away! (And leaving no few puns in its wake..)

Now it supports secure handling of private data like passwords (only the host that owns it can see it), and fully end-to-end secured deployment via gpg signed and verified commits.

And, I've just gotten support for Docker to build. Probably not quite work, but it should only be a few bugs away at this point.

Here's how to deploy a dockerized webserver with propellor:

host hostname@"clam.kitenet.net" = Just
    [ Docker.configured
    , File.dirExists "/var/www"
    , Docker.hasContainer hostname "webserver" container
    ]

container _ "webserver" = Just $ Docker.containerFromImage "joeyh/debian-unstable"
        [ Docker.publish "80:80"
        , Docker.volume "/var/www:/var/www"
        , Docker.inside
            [ serviceRunning "apache2"
                `requires` Apt.installed ["apache2"]
            ]
        ]

Docker containers are set up using Properties too, just like regular hosts, but their Properties are run inside the container.

That means that, if I change the web server port above, Propellor will notice the container config is out of date, and stop the container, commit an image based on it, and quickly use that to bring up a new container with the new configuration.

If I change the web server to say, lighttpd, Propellor will run inside the container, and notice that it needs to install lighttpd to satisfy the new property, and so will update the container without needing to take it down.

Adding all this behavior took only 253 lines of code, and none of it impacts the core of Propellor at all; it's all in Propellor.Property.Docker. (Well, I did need another hundred lines to write a daemon that runs inside the container and reads commands to run over a named pipe... Docker makes running ad-hoc commands inside a container a PITA.)

So, I think that this vindicates the approach of making the configuration of Propellor be a list of Properties, which can be constructed by abitrarily interesting Haskell code. I didn't design Propellor to support containers, but it was easy to find a way to express them as shown above.

Compare that with how Puppet supports Docker: http://docs.docker.io/en/latest/use/puppet/

docker::run { 'helloworld':
  image        => 'ubuntu',
  command      => '/bin/sh -c "while true; do echo hello world; sleep 1; done"',
  ports        => ['4444', '4555'],
...

All puppet manages is running the image and a simple static command inside it. All the complexities that puppet provides for configuring servers cannot easily be brought to bear inside the container, and a large reason for that is, I think, that its configuration file is just not expressive enough.

Posted
introducing propellor

Whups, I seem to have built a configuration management system this evening!

Propellor has similar goals to chef or puppet or ansible, but with an approach much more like slaughter. Except it's configured by writing Haskell code.

The name is because propellor ensures that a system is configured with the desired PROPerties, and also because it kind of pulls system configuration along after it. And you may not want to stand too close.

Disclaimer: I'm not really a sysadmin, except for on the scale of "diffuse administration of every Debian machine on planet earth or nearby", and so I don't really understand configuration management. (Well, I did write debconf, which claims to be the "Debian Configuration Management system".. But I didn't understand configuration management back then either.)

So, propellor makes some perhaps wacky choices. The least of these is that it's built from a git repository that any (theoretical) other users will fork and modify; a cron job can re-make it from time to time and pull down configuration changes, or something can be run to push changes.

A really simple configuration for a Tor bridge server using propellor looks something like this:

main = ensureProperties
    [ Apt.stdSourcesList Apt.Stable `onChange` Apt.upgrade
    , Apt.removed ["exim4"] `onChange` Apt.autoRemove
    , Hostname.set "bridget"
    , Ssh.uniqueHostKeys
    , Tor.isBridge
    ]

Since it's just haskell code, it's "easy" to refactor out common configurations for classes of servers, etc. Or perhaps integrate reclass? I don't know. I'm happy with just pure functions and type-safe refactorings of my configs, I think.

Properties are also written in Haskell of course. This one ensures that all the packages in a list are installed.

installed :: [Package] -> Property
installed ps = check (isInstallable ps) go
  where
        go = runApt $ [Param "-y", Param "install"] ++ map Param ps

Here's one that ensures the hostname is set to the desired value, which shows how to specify content for a file, and also how to run another action if a change needed to be made to satisfy a property.

set :: HostName -> Property
set hostname = "/etc/hostname" `File.hasContent` [hostname]
        `onChange` cmdProperty "hostname" [Param hostname]

Here's part of a custom one that I use to check out a user's home directory from git. Shows how to make a property require that some other property is satisfied first, and how to test if a property has already been satisfied.

installedFor :: UserName -> Property
installedFor user = check (not <$> hasGitDir user) $
        Property ("githome " ++ user) (go =<< homedir user)
                    `requires` Apt.installed ["git", "myrepos"]
  where
    go ... -- 12 lines elided

I'm about 37% happy with the overall approach to listing properties and combining properties into larger properties etc. I think that some unifying insight is missing -- perhaps there should be a Property monad? But as long as it yields a list of properties, any smarter thing should be able to be built on top of this.

Propellor is 564 lines of code, including 25 or so built-in properties like the examples above. It took around 4 hours to build.

I'm pretty sure it was easier to write it than it would have been to look into ansible and salt and slaughter (and also liw's human-readable configuration language whose name I've forgotten) in enough detail to pick one, and learn how its configuration worked, and warp it into something close to how I wanted this to work.

I think that's interesting.. It's partly about NIH and I-want-everything-in-Haskell, but it's also about a complicated system that is a lot of things to a lot of people -- of the kind I see when I look at ansible -- vs the tools and experience to build just the thing you want without the cruft. Nice to have the latter!

Posted

git-annex devblog: last checked (496 posts)