twenty years of free software -- part 5 pristine-tar

I've written retrospectively about pristine-tar before, when I stopped maintaining it. So, I'll quote part of that here:

[...] a little bit about the reason I wrote pristine-tar in the
first place. There were two reasons:

1. I was once in a talk where someone mentioned that Ubuntu had/was
   developing something that involved regenerating orig tarballs
   from version control.
   I asked the obvious question: How could that possibly be done
   technically? 
   The (slightly hung over) presenter did not have a satesfactory
   response, so my curiosity was piqued to find a way to do it.
   (I later heard that Ubuntu has been using pristine-tar..)

2. Sometimes code can be subversive. It can change people's perspective
   on a topic, nudging discourse in a different direction. It can even
   point out absurdities in the way things are done. I may or may not
   have accomplished the subversive part of my goals with pristine-tar.

Code can also escape its original intention. Many current uses of
pristine-tar fall into that category. So it seems likely that some
people will want it to continue to work even if it's met the two goals
above already.

For me, the best part of building pristine-tar was finding an answer to the question "How could that possibly be done technically?" It was also pretty cool to be able to use every tarball in Debian as the test suite for pristine-tar.

I'm afraid I kind of left Debian in the lurch when I stopped maintaining pristine-tar.

"Debian has probably hundreds, if not thousands of git repositories using pristine-tar. We all rely now on an unmaintained, orphaned, and buggy piece of software." -- Norbert Preining

So I was relieved when it finally got a new maintainer just recently.

Still, I don't expect I'll ever use pristine-tar again. It's the only software I've built in the past ten years that I can say that about.

Next: ?twenty years of free software -- part 6 moreutils

Posted
twenty years of free software -- part 4 ikiwiki-hosting

ikiwiki-hosting is a spin-off from ikiwiki. I wrote it to manage many ikiwiki instances for Branchable, and made it free software out of principle.

While Branchable has not reached the point of providing much income, it's still running after 6 years. Ikiwiki-hosting makes it pretty easy to maintain it, and I host all of my websites there.

A couple of other people have also found ikiwiki-hosting useful, which is not only nice, but led to some big improvements to it. Mostly though, releasing the software behind the business as free software caused us to avoid shortcuts and build things well.

Next: twenty years of free software -- part 5 pristine-tar

Posted
twenty years of free software -- part 3 myrepos

myrepos is kind of just an elaborated foreach (@myrepos) loop, but its configuration and extension in a sort of hybrid between an .ini file and shell script is quite nice and plenty of other people have found it useful.

I had to write myrepos when I switched from subversion to git, because git's submodules are too limited to meet my needs, and I needed a tool to check out and update many repositories, not necessarily all using the same version control system.

It was called "mr" originally, but I renamed the package because it's impossible to google for "mr". This is the only software I've ever renamed.

Next: twenty years of free software -- part 4 ikiwiki-hosting

Posted
twenty years of free software -- part 2 etckeeper

etckeeper was a sleeper success for me. I created it, wrote one blog post about it, installed it on all my computers, and mostly forgot about it, except when I needed to look something up in the git history of /etc it helpfully maintains. It's a minor project.

Then I started getting patches porting it to many other version control systems, and other linux distributions, and fixing problems, and adding features. Mountains and mountains of patches over time.

And then I started hearing about distributions that install it by default. (Though Debian for some reason never did so I keep having to install it everywhere by hand.)

Writing this blog post, I noticed etckeeper had accumulated enough patches from other people to warrant a new release. That happens pretty regularly.

So it's still a minor project as far as I'm concerned, but quite a few people seem to be finding it useful. So it goes with free software.

Next: twenty years of free software -- part 3 myrepos

Posted
twenty years of free software -- part 1 ikiwiki

I'm forty years old. I've been developing free software for twenty years.

A decade ago, I wrote a series of posts about my first ten years of free software, looking back over projects I'd developed. These retrospectives seem even more valuable in retrospect; there are things in the old posts that jog my memory, and other details I've forgotten by now.

So, I'm doing it again. Over the next two weeks (with a week off in the middle for summer vacation), I'll be writing one post each day about a free software project I've developed in the past decade.


We begin with Ikiwiki. I started it 10 years ago, and still use it to this day; it's the engine that builds this website, and nearly all my other websites, as well as wikis and websites belonging to tons and tons of other projects, like NetBSD, X.org, Freedesktop.org, FreedomBox and many other users.

Indeed I'm often reading a website and find myself wondering "hey.. is this using Ikiwiki?", and glance at the html and find that yes, it is. So, Ikiwiki is a reasonably successful and widely used peice of software, at least in its niche.

More important to me, it was a static site generator before we knew what those were. It wasn't the first, but it broke plenty of new ground. I'm particularly proud of the way it combines a wiki with blogging support seamlessly, and the incremental updating of the static pages including updating wikilinks and backlinks. Some of these and other features are still pretty unique to Ikiwiki despite the glut of other static site generators available now.

Ikiwiki is written in Perl, which was great for getting lots of other contributions (including many of its 113 plugins), but has also held it back some lately. There are less Perl programmers these days. And over the past decade, concurrency has become very important, but Ikiwiki's implementation is stubbornly single threaded, and multithreading such a Perl program is a losing propoisition. I occasionally threaten to rewrite it in Haskell, but I doubt I will.

Ikiwiki has several developers now, and I'm the least active of them. I stepped back because I can't write Perl very well anymore, and am mostly very happy with how Ikiwiki works, so only pop up now and then when something annoys me.

Next: twenty years of free software -- part 2 etckeeper

second system

Five years ago I built this, and it's worked well, but is old and falling down now.

mark I outdoor shower

The replacement is more minimalist and like any second system tries to improve on the design of the first. No wood to rot away, fully adjustable height. It's basically a shower swing, suspended from a tree branch.

mark II outdoor shower

Probably will turn out to have its own new problems, as second systems do.

electrum ssl vulnerabilities

The electrum bitcoin wallet seems to use SSL insecurely. Here are two bug reports I filed about it:

ignores invalid ssl certs with CERT_NONE, allowing MITM

fails to verify ssl cert hostname for cached certs
(update: Seems I was wrong about this bug)

One full month after I filed these, there's been no activity, so I thought I'd make this a little more widely known. It's too hard to get CVEs assigned, and resgistering a snarky domain name is passe.

I'm not actually using electrum myself currently, as I own no bitcoins. I only noticed these vulnerabilities when idly perusing the code. I have not tried to actually exploit them, and some of the higher levels of the SPV blockchain verification make them difficult to exploit. Or perhaps there are open wifi networks where all electrum connections get intercepted by a rogue server that successfully uses these security holes to pretend to be the entire electrum server network.

I'm not particularly surprised to find code that's supposed to be securing a connection with SSL but actually fails to verify the SSL certificate. That's the common failure mode of SSL libraries.

It is a bit surprising to find such mistakes in a bitcoin wallet though. And, electrum seems to go out of its way to complicate its SSL certificate handling, which directly led to these security holes. Kind of makes me wonder about the security of the rest of it.

Posted
my Shuttleworth Foundation flash grant

Six months ago I received a small grant from the Shuttleworth Foundation with no strings attached other than I should write this blog post about it. That was a nice surprise.

The main thing that ended up being supported by the grant was work on Propellor, my configuration management system that is configured by writing Haskell code. I made 11 releases of Propellor in the grant period, with some improvements from me, and lots more from other contributors. The biggest feature that I added to Propellor was LetsEncrypt support.

More important than features is making Propellor prevent more classes of mistakes, by creative use of the type system. The biggest improvement in this area was type checking the OSes of Propellor properties, so Propellor can reject host configurations that combine eg, Linux-only and FreeBSD-only properties.

Turns out that the same groundwork needed for that is also what's needed to get Propellor to do type-level port conflict detection. I have a branch underway that does that, although it's not quite done yet.

The grant also funded some of my work on git-annex. My main funding for git-annex doesn't cover development of the git-annex assistant, so the grant filled in that gap, particularly in updating the assistant to support the git-annex v6 repo format.

I've very happy to have received this grant, and with the things it enabled me to work on.

Posted
type safe multi-OS Propellor

Propellor was recently ported to FreeBSD, by Evan Cofsky. This new feature led me down a two week long rabbit hole to make it type safe. In particular, Propellor needed to be taught that some properties work on Debian, others on FreeBSD, and others on both.

The user shouldn't need to worry about making a mistake like this; the type checker should tell them they're asking for something that can't fly.

-- Is this a Debian or a FreeBSD host? I can't remember, let's use both package managers!
host "example.com" $ props
    & aptUpgraded
    & pkgUpgraded

As of propellor 3.0.0 (in git now; to be released soon), the type checker will catch such mistakes.

Also, it's really easy to combine two OS-specific properties into a property that supports both OS's:

upgraded = aptUpgraded `pickOS` pkgUpgraded

type level lists and functions

The magick making this work is type-level lists. A property has a metatypes list as part of its type. (So called because it's additional types describing the type, and I couldn't find a better name.) This list can contain one or more OS's targeted by the property:

aptUpgraded :: Property (MetaTypes '[ 'Targeting 'OSDebian, 'Targeting 'OSBuntish ])

pkgUpgraded :: Property (MetaTypes '[ 'Targeting 'OSFreeBSD ])

In Haskell type-level lists and other DataKinds are indicated by the ' if you have not seen that before. There are some convenience aliases and type operators, which let the same types be expressed more cleanly:

aptUpgraded :: Property (Debian + Buntish)

pkgUpgraded :: Property FreeBSD

Whenever two properties are combined, their metatypes are combined using a type-level function. Combining aptUpgraded and pkgUpgraded will yield a metatypes that targets no OS's, since they have none in common. So will fail to type check.

My implementation of the metatypes lists is hundreds of lines of code, consisting entirely of types and type families. It includes a basic implementation of singletons, and is portable back to ghc 7.6 to support Debian stable. While it takes some contortions to support such an old version of ghc, it's pretty awesome that the ghc in Debian stable supports this stuff.

extending beyond targeted OS's

Before this change, Propellor's Property type had already been slightly refined, tagging them with HasInfo or NoInfo, as described in making propellor safer with GADTs and type families. I needed to keep that HasInfo in the type of properties.

But, it seemed unnecessary verbose to have types like Property NoInfo Debian. Especially if I want to add even more information to Property types later. Property NoInfo Debian NoPortsOpen would be a real mouthful to need to write for every property.

Luckily I now have this handy type-level list. So, I can shove more types into it, so Property (HasInfo + Debian) is used where necessary, and Property Debian can be used everywhere else.

Since I can add more types to the type-level list, without affecting other properties, I expect to be able to implement type-level port conflict detection next. Should be fairly easy to do without changing the API except for properties that use ports.

singletons

As shown here, pickOS makes a property that decides which of two properties to use based on the host's OS.

aptUpgraded :: Property DebianLike
aptUpgraded = property "apt upgraded" (apt "upgrade" `requires` apt "update")

pkgUpgraded :: Property FreeBSD
pkgUpgraded = property "pkg upgraded" (pkg "upgrade")
    
upgraded :: Property UnixLike
upgraded = (aptUpgraded `pickOS` pkgUpgraded)
    `describe` "OS upgraded"

Any number of OS's can be chained this way, to build a property that is super-portable out of simple little non-portable properties. This is a sweet combinator!

Singletons are types that are inhabited by a single value. This lets the value be inferred from the type, which came in handy in building the pickOS property combinator.

Its implementation needs to be able to look at each of the properties at runtime, to compare the OS's they target with the actial OS of the host. That's done by stashing a target list value inside a property. The target list value is inferred from the type of the property, thanks to singletons, and so does not need to be passed in to property. That saves keyboard time and avoids mistakes.

is it worth it?

It's important to consider whether more complicated types are a net benefit. Of course, opinions vary widely on that question in general! But let's consider it in light of my main goals for Propellor:

  1. Help save the user from pushing a broken configuration to their machines at a time when they're down in the trenches dealing with some urgent problem at 3 am.
  2. Advance the state of the art in configuration management by taking advantage of the state of the art in strongly typed haskell.

This change definitely meets both criteria. But there is a tradeoff; it got a little bit harder to write new propellor properties. Not only do new properties need to have their type set to target appropriate systems, but the more polymorphic code is, the more likely the type checker can't figure out all the types without some help.

A simple example of this problem is as follows.

foo :: Property UnixLike
foo = p `requires` bar
  where
    p = property "foo" $ do
        ...

The type checker will complain that "The type variable ‘metatypes1’ is ambiguous". Problem is that it can't infer the type of p because many different types could be combined with the bar property and all would yield a Property UnixLike. The solution is simply to add a type signature like p :: Property UnixLike

Since this only affects creating new properties, and not combining existing properties (which have known types), it seems like a reasonable tradeoff.

things to improve later

There are a few warts that I'm willing to live with for now...

Currently, Property (HasInfo + Debian) is different than Property (Debian + HasInfo), but they should really be considered to be the same type. That is, I need type-level sets, not lists. While there's a type level sets library for hackage, it still seems to require a specific order of the set items when writing down a type signature.

Also, using ensureProperty, which runs one property inside the action of another property, got complicated by the need to pass it a type witness.

foo = Property Debian
foo = property' $ \witness -> do
    ensureProperty witness (aptInstall "foo")

That witness is used to type check that the inner property targets every OS that the outer property targets. I think it might be possible to store the witness in the monad, and have ensureProperty read it, but it might complicate the type of the monad too much, since it would have to be parameterized on the type of the witness.

Oh no, I mentioned monads. While type level lists and type functions and generally bending the type checker to my will is all well and good, I know most readers stop reading at "monad". So, I'll stop writing. ;)

thanks

Thanks to David Miani who answered my first tentative question with a big hunk of example code that got me on the right track.

Also to many other people who answered increasingly esoteric Haskell type system questions.

Also thanks to the Shuttleworth foundation, which funded this work by way of a Flash Grant.

documentation first

I write documentation first and code second. I've mentioned this from time to time (previously, previously) but a reader pointed out that I've never really explained why I work that way.

It's a way to make my thinking more concrete without diving all the way into the complexities of the code right away. So sometimes, what I write down is design documentation, and sometimes it's notes on a bug report[1], but if what I'm working on is user-visible, I start by writing down the end user documentation.

Writing things down lets me interact with them as words on a page, which are more concrete than muddled thoughts in the head, and much easier to edit and reason about. Code constrains to existing structures; a blank page frees you to explore and build up new ideas. It's the essay writing process, applied to software development, with a side effect of making sure everything is documented.

Also, end-user documentation is best when it doesn't assume that the user has any prior knowledge. The point in time when I'm closest to perfect lack of knowledge about something is before I've built it[2]. So, that's the best time to document it.

I understand what I'm trying to tell you better now that I've written it down than I did when I started. Hopefully you do too.


[1] I'll often write a bug report down even if I have found the bug myself and am going to fix it myself on the same day. (example) This is one place where it's nice to have bug reports as files in the same repository as the code, so that the bug report can be included in the commit fixing it. Often the bug report has lots of details that don't need to go into the commit message, but explain more about my evolving thinking about a problem.

[2] Technically I'm even more clueless ten years later when I've totally forgotten whatever, but it's not practical to wait. ;-)

Posted