Six months ago I received a small grant from the Shuttleworth Foundation with no strings attached other than I should write this blog post about it. That was a nice surprise.
The main thing that ended up being supported by the grant was work on Propellor, my configuration management system that is configured by writing Haskell code. I made 11 releases of Propellor in the grant period, with some improvements from me, and lots more from other contributors. The biggest feature that I added to Propellor was LetsEncrypt support.
More important than features is making Propellor prevent more classes of mistakes, by creative use of the type system. The biggest improvement in this area was type checking the OSes of Propellor properties, so Propellor can reject host configurations that combine eg, Linux-only and FreeBSD-only properties.
Turns out that the same groundwork needed for that is also what's needed to get Propellor to do type-level port conflict detection. I have a branch underway that does that, although it's not quite done yet.
The grant also funded some of my work on git-annex. My main funding for git-annex doesn't cover development of the git-annex assistant, so the grant filled in that gap, particularly in updating the assistant to support the git-annex v6 repo format.
I've very happy to have received this grant, and with the things it enabled me to work on.
Propellor was recently ported to FreeBSD, by Evan Cofsky. This new feature led me down a two week long rabbit hole to make it type safe. In particular, Propellor needed to be taught that some properties work on Debian, others on FreeBSD, and others on both.
The user shouldn't need to worry about making a mistake like this; the type checker should tell them they're asking for something that can't fly.
-- Is this a Debian or a FreeBSD host? I can't remember, let's use both package managers! host "example.com" $ props & aptUpgraded & pkgUpgraded
As of propellor 3.0.0 (in git now; to be released soon), the type checker will catch such mistakes.
Also, it's really easy to combine two OS-specific properties into a property that supports both OS's:
upgraded = aptUpgraded `pickOS` pkgUpgraded
type level lists and functions
The magick making this work is type-level lists. A property has a metatypes list as part of its type. (So called because it's additional types describing the type, and I couldn't find a better name.) This list can contain one or more OS's targeted by the property:
aptUpgraded :: Property (MetaTypes '[ 'Targeting 'OSDebian, 'Targeting 'OSBuntish ]) pkgUpgraded :: Property (MetaTypes '[ 'Targeting 'OSFreeBSD ])
In Haskell type-level lists and other DataKinds are indicated by the
' if you have not seen that before. There are some convenience
aliases and type operators, which let the same types be expressed
aptUpgraded :: Property (Debian + Buntish) pkgUpgraded :: Property FreeBSD
Whenever two properties are combined, their metatypes are combined
using a type-level function. Combining
will yield a metatypes that targets no OS's, since they have none in
common. So will fail to type check.
My implementation of the metatypes lists is hundreds of lines of code, consisting entirely of types and type families. It includes a basic implementation of singletons, and is portable back to ghc 7.6 to support Debian stable. While it takes some contortions to support such an old version of ghc, it's pretty awesome that the ghc in Debian stable supports this stuff.
extending beyond targeted OS's
Before this change, Propellor's Property type had already been slightly
refined, tagging them with
NoInfo, as described
in making propellor safer with GADTs and type families. I needed to
HasInfo in the type of properties.
But, it seemed unnecessary verbose to have types like
Property NoInfo Debian.
Especially if I want to add even more information to Property
Property NoInfo Debian NoPortsOpen would be a real mouthful to
need to write for every property.
Luckily I now have this handy type-level list. So, I can shove more
types into it, so
Property (HasInfo + Debian) is used where necessary,
Property Debian can be used everywhere else.
Since I can add more types to the type-level list, without affecting other properties, I expect to be able to implement type-level port conflict detection next. Should be fairly easy to do without changing the API except for properties that use ports.
As shown here,
pickOS makes a property that
decides which of two properties to use based on the host's OS.
aptUpgraded :: Property DebianLike aptUpgraded = property "apt upgraded" (apt "upgrade" `requires` apt "update") pkgUpgraded :: Property FreeBSD pkgUpgraded = property "pkg upgraded" (pkg "upgrade") upgraded :: Property UnixLike upgraded = (aptUpgraded `pickOS` pkgUpgraded) `describe` "OS upgraded"
Any number of OS's can be chained this way, to build a property that is super-portable out of simple little non-portable properties. This is a sweet combinator!
Singletons are types that are inhabited by a single value.
This lets the value be inferred from the type, which came in handy
in building the
pickOS property combinator.
Its implementation needs to be able to look at each of the properties at
runtime, to compare the OS's they target with the actial OS of the host.
That's done by stashing a target list value inside a property. The target
list value is inferred from the type of the property, thanks to singletons,
and so does not need to be passed in to
property. That saves
keyboard time and avoids mistakes.
is it worth it?
It's important to consider whether more complicated types are a net benefit. Of course, opinions vary widely on that question in general! But let's consider it in light of my main goals for Propellor:
- Help save the user from pushing a broken configuration to their machines at a time when they're down in the trenches dealing with some urgent problem at 3 am.
- Advance the state of the art in configuration management by taking advantage of the state of the art in strongly typed haskell.
This change definitely meets both criteria. But there is a tradeoff; it got a little bit harder to write new propellor properties. Not only do new properties need to have their type set to target appropriate systems, but the more polymorphic code is, the more likely the type checker can't figure out all the types without some help.
A simple example of this problem is as follows.
foo :: Property UnixLike foo = p `requires` bar where p = property "foo" $ do ...
The type checker will complain that "The type variable ‘metatypes1’ is
ambiguous". Problem is that it can't infer the type of
p because many
different types could be combined with the
bar property and all would
Property UnixLike. The solution is simply to add a type signature
p :: Property UnixLike
Since this only affects creating new properties, and not combining existing properties (which have known types), it seems like a reasonable tradeoff.
things to improve later
There are a few warts that I'm willing to live with for now...
Property (HasInfo + Debian) is different than
Property (Debian +
HasInfo), but they should really be considered to be the same type. That is, I
need type-level sets, not lists. While there's a type level sets library for
hackage, it still seems to
require a specific order of the set items when writing down a type signature.
ensureProperty, which runs one property inside the action
of another property, got complicated by the need to pass it a type witness.
foo = Property Debian foo = property' $ \witness -> do ensureProperty witness (aptInstall "foo")
That witness is used to type check that the inner property targets every OS that the outer property targets. I think it might be possible to store the witness in the monad, and have ensureProperty read it, but it might complicate the type of the monad too much, since it would have to be parameterized on the type of the witness.
Oh no, I mentioned monads. While type level lists and type functions and generally bending the type checker to my will is all well and good, I know most readers stop reading at "monad". So, I'll stop writing. ;)
Thanks to David Miani who answered my first tentative question with a big hunk of example code that got me on the right track.
Also to many other people who answered increasingly esoteric Haskell type system questions.
Also thanks to the Shuttleworth foundation, which funded this work by way of a Flash Grant.
It's a way to make my thinking more concrete without diving all the way into the complexities of the code right away. So sometimes, what I write down is design documentation, and sometimes it's notes on a bug report, but if what I'm working on is user-visible, I start by writing down the end user documentation.
Writing things down lets me interact with them as words on a page, which are more concrete than muddled thoughts in the head, and much easier to edit and reason about. Code constrains to existing structures; a blank page frees you to explore and build up new ideas. It's the essay writing process, applied to software development, with a side effect of making sure everything is documented.
Also, end-user documentation is best when it doesn't assume that the user has any prior knowledge. The point in time when I'm closest to perfect lack of knowledge about something is before I've built it. So, that's the best time to document it.
I understand what I'm trying to tell you better now that I've written it down than I did when I started. Hopefully you do too.
 I'll often write a bug report down even if I have found the bug myself and am going to fix it myself on the same day. (example) This is one place where it's nice to have bug reports as files in the same repository as the code, so that the bug report can be included in the commit fixing it. Often the bug report has lots of details that don't need to go into the commit message, but explain more about my evolving thinking about a problem.
 Technically I'm even more clueless ten years later when I've totally forgotten whatever, but it's not practical to wait. ;-)
Canonical appear to require that you remove all trademarks entirely even if using them wouldn't be a violation of trademark law.
Each time Matthew brings this up, and as evidence continues to mount that Canonical either actually intends their IP policy to be read that way, or is intentionally keeping the situation unclear to FUD derivatives, I start wondering about references to Ubuntu in my software.
Should such references be removed, or obscured, like "U*NIX" in software of old, to prevent exposing users to this trademark nonsense?
joey@darkstar:~/src/git-annex>git grep -i ubuntu |wc -l 457 joey@darkstar:~/src/ikiwiki>git grep -i ubuntu |wc -l 80 joey@darkstar:~/src/etckeeper>git grep -i ubuntu |wc -l 14
Most of the code in git-annex, ikiwiki, and etckeeper is licensed under the GPL or AGPL, and so Canonical's IP policy probably does not require that anyone basing a distribution on Ubuntu strip all references to "Ubuntu" from them. But then, there's Propellor:
joey@darkstar:~/src/propellor>git grep -i ubuntu |wc -l 10
Propellor is BSD licensed. It's in Ubuntu universe. It not only references Ubuntu in documentation, but contains code that uses that trademark:
data Distribution = Debian DebianSuite | Ubuntu Release
So, if an Ubuntu-derived distribution has to remove "Ubuntu" from Propellor, they'd end up with a Propellor that either differs from upstream, or that can't be used to manage Ubuntu systems. Neither choice is good for users. Probably most small derived distributions would not have expertise to patch data types in a Haskell program and would have to skip including Propellor. That's not good for Propellor getting wide distribution either.
I think I've convinced myself it would be for the best to remove all
references to "Ubuntu" from Propellor.
Similarly, Debconf is BSD licensed. I originally wrote it, but it's now maintained by Colin Watson, who works for Canonical. If I were still maintaining Debconf, I'd be looking at removing all instances of "Ubuntu" from it and preventing that and other Canonical trademarks from slipping back in later. Alternatively, I'd be happy to re-license all Debconf code that I wrote under the AGPL-3+.
Update: Another package that comes to mind is Debootstrap, which is also BSD licensed. Of course it contains "Ubuntu" in lots of places, since it is how Ubuntu systems are built. I'm no longer an active developer of Debootstrap, but I hope its current developers carefully consider how this trademark nonsense affects it.
PS: Shall we use "*buntu" as the, erm, canonical trademark-free spelling of "Ubuntu"? Seems most reasonable, unless Canonical has trademarked that too.
I've integrated letsencrypt into propellor today.
I'm using the reference letsencrypt client. While I've seen complaints that
it has a lot of dependencies and is too complicated, it seemed to only need
to pull in a few packages, and use only a few megabytes of disk space, and
it has fewer options than
ls does. So seems fine. (Although it would be
nice to have some alternatives packaged in Debian.)
I ended up implementing this:
letsEncrypt :: AgreeTOS -> Domain -> WebRoot -> Property NoInfo
This property just makes the certificate available, it does not configure the web server to use it. This avoids relying on the letsencrypt client's apache config munging, which is probably useful for many people, but not those of us using configuration management systems. And so avoids most of the complicated magic that the letsencrypt client has a reputation for.
Instead, any property that wants to use the certificate can just use leteencrypt to get it and set up the server when it makes a change to the certificate:
letsEncrypt (LetsEncrypt.AgreeTOS (Just "firstname.lastname@example.org")) "example.com" "/var/www" `onChange` setupthewebserver
(Took me a while to notice I could use
onChange like that,
and so divorce the cert generation/renewal from the server setup.
onChange is awesome! This blog post has been updated accordingly.)
In practice, the http site has to be brought up first, and then letsencrypt run, and then the cert installed and the https site brought up using it. That dance is automated by this property:
Apache.httpsVirtualHost "example.com" "/var/www" (LetsEncrypt.AgreeTOS (Just "email@example.com"))
That's about as simple a configuration as I can imagine for such a website!
The two parts of letsencrypt that are complicated are not the fault of the client really. Those are renewal and rate limiting.
I'm currently rate limited for the next week because I asked letsencrypt for several certificates for a domain, as I was learning how to use it and integrating it into propellor. So I've not quite managed to fully test everything. That's annoying. I also worry that rate limiting could hit at an inopportune time once I'm relying on letsencrypt. It's especially problimatic that it only allows 5 certs for subdomains of a given domain per week. What if I use a lot of subdomains?
Renewal is complicated mostly because there's no good way to test it. You set up your cron job, or whatever, and wait three months, and hopefully it worked. Just as likely, you got something wrong, and your website breaks. Maybe letsencrypt could offer certificates that will only last an hour, or a day, for use when testing renewal.
Also, what if something goes wrong with renewal? Perhaps letsencrypt.org is not available when your certificate needs to be renewed.
What I've done in propellor to handle renewal is, it runs letsencrypt every time, with the --keep-until-expiring option. If this fails, propellor will report a failure. As long as propellor is run periodically by a cron job, this should result in multiple failure reports being sent (for 30 days I think) before a cert expires without getting renewed. But, I have not been able to test this.
Version 6 of git-annex, released last week, adds a major new feature; support for unlocked large files that can be edited as usual and committed using regular git commands.
git init git annex init --version=6 mv ~/foo.iso . git add foo.iso git commit -m "added hundreds of megabytes to git annex (not git)" git remote add origin ssh://sever/dir git annex sync origin --content # uploads foo.iso
Compare that with how git-annex has worked from the beginning, where
git annex add is used to add a file, and then the file is locked,
preventing further modifications of it. That is still a very useful way to
use git-annex for many kinds of files, and is still supported of course.
Indeed, you can easily switch files back and forth between being locked and
This new unlocked file mode uses git's smudge/clean filters, and I was busy developing it all through December. It started out playing catch-up with git-lfs somewhat, but has significantly surpassed it now in several ways.
So, if you had tried git-annex before, but found it didn't meet your needs, you may want to give it another look now.
Now a few thoughts on git-annex vs git-lfs, and different tradeoffs made by them.
After trying it out, my feeling is that git-lfs brings an admirable simplicity to using git with large files. File contents are automatically uploaded to the server when a git branch is pushed, and downloaded when a branch is merged, and after setting it up, the user may not need to change their git workflow at all to use git-lfs.
But there are some serious costs to that simplicity. git-lfs is a centralized system. This is especially problimatic when dealing with large files. Being a decentralized system, git-annex has a lot more flexability, like transferring large file contents peer-to-peer over a LAN, and being able to choose where large quantities of data are stored (maybe in S3, maybe on a local archive disk, etc).
The price git-annex pays for this flexability is you have to configure it, and run some additional commands. And, it has to keep track of what content is located where, since it can't assume the answer is "in the central server".
The simplicity of git-lfs also means that the user doesn't have much control over what files are present in their checkout of a repository. git-lfs downloads all the files in the work tree. It doesn't have facilities for dropping the content of some files to free up space, or for configuring a repository to only want to get a subset of files in the first place. On the other hand, git-annex has excellent support for all those things, and this comes largely for free from its decentralized design.
If git has showed us anything, it's perhaps that a little added complexity to support a fully distributed system won't prevent people using it. Even if many of them end up using it in a mostly centralized way. And that being decentralized can have benefits beyond the obvious ones.
Oh yeah, one other advantage of git-annex over git-lfs. It can use half as much disk space!
A clone of a git-lfs repository contains one copy of each file in the work
tree. Since the user can edit that file at any time, or checking out a
different branch can delete the file, it also stashes a copy inside
One of the main reasons git-annex used locked files, from the very beginning, was to avoid that second copy. A second local copy of a large file can be too expensive to put up with. When I added unlocked files in git-annex v6, I found it needed a second copy of them, same as git-lfs does. That's the default behavior. But, I decided to complicate git-annex with a config setting:
git config annex.thin true git annex fix
Run those two commands, and now only one copy is needed for unlocked files! How's it work? Well, it comes down to hard links. But there is a tradeoff here, which is why this is not the default: When you edit a file, no local backup is preserved of its old content. So you have to make sure to let git-annex upload files to another repository before editing them or the old version could get lost. So it's a tradeoff, and maybe it could be improved. (Only thin out a file after a copy has been uploaded?)
This adds a small amount of complexity to git-annex, but I feel it's well worth it to let unlocked files use half the disk space. If the git-lfs developers are reading this, that would probably be my first suggestion for a feature to consider adding to git-lfs. I hope for more opportunities to catch-up to git-lfs in turn.
concurrent-output released yesterday got a lot of fun features. It now does full curses-style minimization of the output, to redraw updated lines with optimal efficiency. And supports multiline regions/wrapping too long lines. And allows the user to embed ANSI colors in a region. 3 features that are in some tension and were fun to implement all together.
But I have a more interesting feature to blog about... I've added the ability for the content of a Region to be determined by a (STM transaction).
Here, for example, is a region that's a clock:
timeDisplay :: TVar UTCTime -> STM Text timeDisplay tv = T.pack . show <$> readTVar tv clockRegion :: IO ConsoleRegionHandle clockRegion = do tv <- atomically . newTVar =<< getCurrentTime r <- openConsoleRegion Linear setConsoleRegion r (timeDisplay tv) async $ forever $ do threadDelay 1000000 -- 1 sec atomically . (writeTVar tv) =<< getCurrentTime return r
There's something magical about this. Whenever a new value is written into the TVar, concurrent-output automatically knows that this region needs to be updated. How does it know how to do that?
Magic of STM. Basically, concurrent-output composes all the STM transactions of Regions, and asks STM to wait until there's something new to display. STM keeps track of whatever TVars might be looked at, and so can put the display thread to sleep until there's a change to display.
Using STM I've gotten extensability for free, due to the nice ways that STM transactions compose.
A few other obvious things to do with this: Compose 2 regions with padding so they display on the same line, left and right aligned. Trim a region's content to the display width. (Handily exported by concurrent-output in a TVar for this kind of thing.)
I'm tempted to write a console spreadsheet using this. Each visible cell of the spreadsheet would have its own region, that uses a STM transaction to display. Plain data Cells would just display their current value. Cells that contain a function would read the current values of other Cells, and use that to calculate what to display. Which means that a Cell containing a function would automatically update whenever any of the Cells that it depends on were updated!
Do you think that a simple interactive spreadsheet built this way would be more than 100 lines of code?
Building on top of concurrent-output, and some related work Joachim Breitner did earlier, I now have a kind of equivilant to a tiling window manager, except it's managing regions of the console for different parts of a single program.
Here's a really silly demo, in an animated gif:
Not bad for 23 lines of code, is that? Seems much less tedious to do things this way than using ncurses. Even with its panels, ncurses requires you to think about layout of various things on the screen, and many low-level details. This, by contrast, is compositional, just add another region and a thread to update it, and away it goes.
So, here's an apt-like download progress display, in 30 lines of code.
Not only does it have regions which are individual lines of the screen, but those can have sub-regions within them as seen here (and so on).
And, log-type messages automatically scroll up above the regions.
External programs run by
createProcessConcurrent will automatically
get their output/errors displayed there, too.
What I'm working on now is support for multiline regions, which automatically grow/shrink to fit what's placed in them. The hard part, which I'm putting the finishing touches on, is to accurately work out how large a region is before displaying it, in order to lay it out. Requires parsing ANSI codes amoung other things.
There's so much concurrency, with complicated interrelated data being updated by different threads, that I couldn't have possibly built this without Software Transactional Memory.
Rather than a nightmare of locks behind locks behind locks, the result is so well behaved that I'm confident that anyone who needs more control over the region layout, or wants to do funky things can dive into to the STM interface and update the data structures, and nothing will ever deadlock or be inconsistent, and as soon as an update completes, it'll display on-screen.
An example of how powerful and beuatiful STM is, here's how the main display thread determines when it needs to refresh the display:
data DisplayChange = BufferChange [(StdHandle, OutputBuffer)] | RegionChange RegionSnapshot | TerminalResize (Maybe Width) | EndSignal () ... change <- atomically $ (RegionChange <$> regionWaiter origsnapshot) `orElse` (RegionChange <$> regionListWaiter origsnapshot) `orElse` (BufferChange <$> outputBufferWaiterSTM waitCompleteLines) `orElse` (TerminalResize <$> waitwidthchange) `orElse` (EndSignal <$> waitTSem endsignal) case change of RegionChange snapshot -> do ... BufferChange buffers -> do ... TerminalResize width -> do ...
So, it composes all these STM actions that can wait on various kinds of changes, to get one big action, that waits for all of the above, and builds up a nice sum type to represent what's changed.
Another example is that the whole support for sub-regions only involved adding 30 lines of code, all of it using STM, and it worked 100% the first time.
Available in concurrent-output 1.1.0.
concurrent-output is a Haskell library I've developed this week, to make it easier to write console programs that do a lot of different things concurrently, and want to serialize concurrent outputs sanely.
It's increasingly easy to write concurrent programs, but all their status reporting has to feed back through the good old console, which is still obstinately serial.
Haskell illustrates problem this well with this "Linus's first kernel" equivilant interleaving the output of 2 threads:
> import System.IO > import Control.Concurrent.Async > putStrLn (repeat 'A') `concurrently` putStrLn (repeat 'B') BABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABA BABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABA ...
That's fun, but also horrible if you wanted to display some messages to the user:
> putStrLn "washed the car" `concurrently` putStrLn "walked the dog" walwkaesdh etdh et hdeo gc ar
To add to the problem, we often want to run separate programs concurrently,
which have output of their own to display. And, just to keep things
interesting, sometimes a unix program will behave differently when stdout
is not connected to a terminal (eg,
ls | cat).
To tame simple concurrent programs like these so they generate readable output involves a lot of plumbing. Something like, run the actions concurrently, taking care to capture the output of any commands, and then feed the output that the user should see though some sort of serializing channel to the display. Dealing with that when you just wanted a simple concurrent program risks ending up with a not-so-simple program.
So, I wanted an library with basically 2 functions:
outputConcurrent :: String -> IO () createProcessConcurrent :: CreateProcess -> IO whatever
The idea is, you make your program use
outputConcurrent to display
all its output, and each String you pass to that will be displayed serially,
without getting mixed up with any other concurrent output.
And, you make your program use
createProcessConcurrent everywhere it
starts a process that might output to stdout or stderr, and it'll likewise
make sure its output is displayed serially.
createProcessConcurrent should avoid redirecting stdout and
stderr away from the console, when no other concurrent output is happening.
So, if programs are mostly run sequentially, they behave as they normally
would at the console; any behavior changes should only occur when
there is concurrency. (It might also be nice for it to allocate
ttys and run programs there to avoid any behavior changes at all,
although I have not tried to do that.)
And that should be pretty much the whole API, although it's ok if it needs some function called by main to set it up:
import Control.Concurrent.Async import System.Console.Concurrent import System.Process main = withConcurrentOutput $ outputConcurrent "washed the car\n" `concurrently` createProcessConcurrent (proc "ls" ) `concurrently` outputConcurrent "walked the dog\n"
$ ./demo washed the car walked the dog Maildir/ bin/ doc/ html/ lib/ mail/ mnt/ src/ tmp/
I think that's a pretty good API to deal with this concurrent output problem. Anyone know of any other attempts at this I could learn from?
I implemented this over the past 3 days and 320 lines of code. It got rather hairy:
- It has to do buffering of the output.
- There can be any quantity of output, but program memory use should be reasonably small. Solved by buffering up to 1 mb of output in RAM, and writing excess buffer to temp files.
- Falling off the end of the program is complicated; there can be buffered output to flush and it may have to wait for some processes to finish running etc.
- The locking was tough to get right! I could not have managed to write it correctly without STM.
It seems to work pretty great though. I got Propellor using it, and Propellor can now run actions concurrently!
Following up on Then and Now ...
In quiet moments at ICFP last August, I finished teaching Propellor to generate disk images. With an emphasis on doing a whole lot with very little new code and extreme amount of code reuse.
For example, let's make a disk image with nethack on it. First, we need to define a chroot. Disk image creation reuses propellor's chroot support, described back in propelling containers. Any propellor properties can be assigned to the chroot, so it's easy to describe the system we want.
nethackChroot :: FilePath -> Chroot nethackChroot d = Chroot.debootstrapped (System (Debian Stable) "amd64") mempty d & Apt.installed ["linux-image-amd64"] & Apt.installed ["nethack-console"] & accountFor gamer & gamer `hasInsecurePassword` "hello" & gamer `hasLoginShell` "/usr/games/nethack" where gamer = User "gamer"
Now to make an image from that chroot, we just have to tell propellor where to put the image file, some partitioning information, and to make it boot using grub.
nethackImage :: RevertableProperty nethackImage = imageBuilt "/srv/images/nethack.img" nethackChroot MSDOS (grubBooted PC) [ partition EXT2 `mountedAt` "/boot" `setFlag` BootFlag , partition EXT4 `mountedAt` "/" `addFreeSpace` MegaBytes 100 , swapPartition (MegaBytes 256) ]
The disk image partitions default to being sized to fit exactly the files
from the chroot that go into each partition, so, the disk image is as small
as possible by default. There's a little DSL to configure the partitions.
To give control over the partition size, it has some functions, like
setSize. Other functions like
extended can further adjust the partitions. I think that worked out
rather well; the partition specification is compact and avoids unecessary
hardcoded sizes, while providing plenty of control.
By the end of ICFP, I had Propellor building complete disk images, but no boot loader installed on them.
Fast forward to today. After stuggling with some strange grub behavior, I found a working method to install grub onto a disk image.
The whole disk image feature weighs in at:
203 lines to interface with parted
88 lines to format and mount partitions
90 lines for the partition table specification DSL and partition sizing
196 lines to generate disk images
75 lines to install grub on a disk image
652 lines of code total
Which is about half the size of vmdebootstrap 1/4th the size of partman-base (probably 1/100th the size of total partman), and 1/13th the size of live-build. All of which do similar things, in ways that seem to me to be much less flexible than Propellor.
One thing I'm considering doing is extending this so Propellor can use qemu-user-static to create disk images for eg, arm. Add some u-boot setup, and this could create bootable images for arm boards. A library of configs for various arm boards could then be included in Propellor. This would be a lot easier than running the Debian Installer on an arm board.
Oh! I only just now realized that if you have a propellor host configured,
like this example for my dialup gateway,
leech = host "leech.kitenet.net" & os (System (Debian (Stable "jessie")) "armel") & Apt.installed ["linux-image-kirkwood", "ppp", "screen", "iftop"] & privContent "/etc/ppp/peers/provider" & privContent "/etc/ppp/pap-secrets" & Ppp.onBoot & hasPassword (User "root") & Ssh.installed
-- The host's properties can be extracted from it, using eg
hostProperties leech and reused to create a disk image with
the same properties as the host!
So, when my dialup gateway gets struck by lightning again, I could use this to build a disk image for its replacement:
import qualified Propellor.Property.Hardware.SheevaPlug as SheevaPlug laptop = host "darkstar.kitenet.net" & SheevaPlug.diskImage "/srv/images/leech.img" (MegaBytes 2000) (& propertyList "has all of leech's properties" (hostProperties leech))
This also means you can start with a manually built system, write down the properties it has, and iteratively run Propellor against it until you think you have a full specification of it, and then use that to generate a new, clean disk image. Nice way to transition from sysadmin days of yore to a clean declaratively specified system.