it's a bird, it's a plane, it's a super monoid for propellor

I've been doing a little bit of dynamically typed programming in Haskell, to improve Propellor's Info type. The result is kind of interesting in a scary way.

Info started out as a big record type, containing all the different sorts of metadata that Propellor needed to keep track of. Host IP addresses, DNS entries, ssh public keys, docker image configuration parameters... This got quite out of hand. Info needed to have its hands in everything, even types that should have been private to their module.

To fix that, recent versions of Propellor let a single Info contain many different types of values. Look at it one way and it contains DNS entries; look at it another way and it contains ssh public keys, etc.

As an émigré from lands where you can never know what type of value is in a $foo until you look, this was a scary prospect at first, but I found it's possible to have the benefits of dynamic types and the safety of static types too.

The key to doing it is Data.Dynamic. Thanks to Joachim Breitner for suggesting I could use it here. What I arrived at is this type (slightly simplified):

newtype Info = Info [Dynamic]
    deriving (Monoid)

So Info is a monoid, and it holds of a bunch of dynamic values, which could each be of any type at all. Eep!

So far, this is utterly scary to me. To tame it, the Info constructor is not exported, and so the only way to create an Info is to start with mempty and use this function:

addInfo :: (IsInfo v, Monoid v) => Info -> v -> Info
addInfo (Info l) v = Info (toDyn v : l)

The important part of that is that only allows adding values that are in the IsInfo type class. That prevents the foot shooting associated with dynamic types, by only allowing use of types that make sense as Info. Otherwise arbitrary Strings etc could be passed to addInfo by accident, and all get concated together, and that would be a total dynamic programming mess.

Anything you can add into an Info, you can get back out:

getInfo :: (IsInfo v, Monoid v) => Info -> v
getInfo (Info l) = mconcat (mapMaybe fromDynamic (reverse l))

Only monoids can be stored in Info, so if you ask for a type that an Info doesn't contain, you'll get back mempty.

Crucially, IsInfo is an open type class. Any module in Propellor can make a new data type and make it an instance of IsInfo, and then that new data type can be stored in the Info of a Property, and any Host that uses the Property will have that added to its Info, available for later introspection.


For example, this weekend I'm extending Propellor to have controllers: Hosts that are responsible for running Propellor on some other hosts. Useful if you want to run propellor once and have it update the configuration of an entire network of hosts.

There can be whole chains of controllers controlling other controllers etc. The problem is, what if host foo has the property controllerFor bar and host bar has the property controllerFor foo? I want to avoid a loop of foo running Propellor on bar, running Propellor on foo, ...

To detect such loops, each Host's Info should contain a list of the Hosts it's controlling. Which is not hard to accomplish:

newtype Controlling = Controlled [Host]
    deriving (Typeable, Monoid)

isControlledBy :: Host -> Controlling -> Bool
h `isControlledBy` (Controlled hs) = any (== hostName h) (map hostName hs)

instance IsInfo Controlling where
    propigateInfo _ = True

mkControllingInfo :: Host -> Info
mkControllingInfo controlled = addInfo mempty (Controlled [controlled])

getControlledBy :: Host -> Controlling
getControlledBy = getInfo . hostInfo

isControllerLoop :: Host -> Host -> Bool
isControllerLoop controller controlled = go S.empty controlled
  where
    go checked h
        | controller `isControlledBy` c = True
        -- avoid checking loops that have been checked before
        | hostName h `S.member` checked = False
        | otherwise = any (go (S.insert (hostName h) checked)) l
      where
        c@(Controlled l) = getControlledBy h

This is all internal to the module that needs it; the rest of propellor doesn't need to know that the Info is using used for this. And yet, the necessary information about Hosts is gathered as propellor runs.


So, that's a useful technique. I do wonder if I could somehow make addInfo combine together values in the list that have the same type; as it is the list can get long. And, to show Info, the best I could do was this:

 instance Show Info where
            show (Info l) = "Info " ++ show (map dynTypeRep l)

The resulting long list of the types of vales stored in a host's info is not a useful as it could be. Of course, getInfo can be used to get any particular type of value:

*Main> hostInfo kite
Info [InfoVal System,PrivInfo,PrivInfo,Controlling,DnsInfo,DnsInfo,DnsInfo,AliasesInfo, ...
*Main> getInfo (hostInfo kite) :: AliasesInfo
AliasesInfo (fromList ["downloads.kitenet.net","git.joeyh.name","imap.kitenet.net","nntp.olduse.net" ...

And finally, I keep trying to think of a better name than "Info".

propellor orchestration

With the disclamer that I don't really know much about orchestration, I have added support for something resembling it to Propellor.

Until now, when using propellor to manage a bunch of hosts, you updated them one at a time by running propellor --spin $somehost, or maybe you set up a central git repository, and a cron job to run propellor on each host, pulling changes from git.

I like both of these ways to use propellor, but they only go so far...

  • Perhaps you have a lot of hosts, and would like to run propellor on them all concurrently.

      master = host "master.example.com"
          & concurrently conducts alotofhosts
    
  • Perhaps you want to run propellor on your dns server last, so when you add a new webserver host, it gets set up and working before the dns is updated to point to it.

      master = host "master.example.com"
          & conducts webservers
              `before` conducts dnsserver
    
  • Perhaps you have something more complex, with multiple subnets that propellor can run in concurrently, finishing up by updating that dnsserver.

      master = host "master.example.com"
          & concurrently conducts [sub1, sub2]
              `before` conducts dnsserver
    
      sub1 = "master.subnet1.example.com"
          & concurrently conducts webservers
          & conducts loadbalancers
    
      sub2 = "master.subnet2.example.com"
          & conducts dockerservers
    
  • Perhaps you need to first run some command that creates a VPS host, and then want to run propellor on that host to set it up.

      vpscreate h = cmdProperty "vpscreate" [hostName h]
          `before` conducts h
    

All those scenarios are supported by propellor now!

Well, I haven't actually implemented concurrently yet, but the point is that the conducts property can be used with any of propellor's property combinators, like before etc, to express all kinds of scenarios.

The conducts property works in combination with an orchestrate function to set up all the necessary stuff to let one host ssh into another and run propellor there.

main = defaultMain (orchestrate hosts)

hosts = 
    [ master
    , webservers 
    , ...
    ]

The orchestrate function does a bunch of stuff:

  • Builds up a graph of what conducts what.
  • Removes any cycles that might have snuck in by accident, before they cause foot shooting.
  • Arranges for the ssh keys to be accepted as necessary.
    Note that you you need to add ssh key properties to all relevant hosts so it knows what keys to trust.
  • Arranges for the private data of a host to be provided to the hosts that conduct it, so they can pass it along.

I've very pleased that I was able to add the Propellor.Property.Conductor module implementing this with only a tiny change to the rest of propellor. Almost everything needed to implement it was there in propellor's infrastructure already.

Also kind of cool that it only needed 13 lines of imperative code, the other several hundred lines of the implementation being all pure code.

propelling disk images

Following up on Then and Now ...

In quiet moments at ICFP last August, I finished teaching Propellor to generate disk images. With an emphasis on doing a whole lot with very little new code and extreme amount of code reuse.

For example, let's make a disk image with nethack on it. First, we need to define a chroot. Disk image creation reuses propellor's chroot support, described back in propelling containers. Any propellor properties can be assigned to the chroot, so it's easy to describe the system we want.

 nethackChroot :: FilePath -> Chroot
    nethackChroot d = Chroot.debootstrapped (System (Debian Stable) "amd64") mempty d
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["nethack-console"]
        & accountFor gamer
        & gamer `hasInsecurePassword` "hello"
        & gamer `hasLoginShell` "/usr/games/nethack"
      where gamer = User "gamer"

Now to make an image from that chroot, we just have to tell propellor where to put the image file, some partitioning information, and to make it boot using grub.

 nethackImage :: RevertableProperty
    nethackImage = imageBuilt "/srv/images/nethack.img" nethackChroot
        MSDOS (grubBooted PC)
        [ partition EXT2 `mountedAt` "/boot"
            `setFlag` BootFlag
        , partition EXT4 `mountedAt` "/"
            `addFreeSpace` MegaBytes 100
        , swapPartition (MegaBytes 256)
        ]

The disk image partitions default to being sized to fit exactly the files from the chroot that go into each partition, so, the disk image is as small as possible by default. There's a little DSL to configure the partitions. To give control over the partition size, it has some functions, like addFreeSpace and setSize. Other functions like setFlag and extended can further adjust the partitions. I think that worked out rather well; the partition specification is compact and avoids unecessary hardcoded sizes, while providing plenty of control.

By the end of ICFP, I had Propellor building complete disk images, but no boot loader installed on them.


Fast forward to today. After stuggling with some strange grub behavior, I found a working method to install grub onto a disk image.

The whole disk image feature weighs in at:

203 lines to interface with parted
88 lines to format and mount partitions
90 lines for the partition table specification DSL and partition sizing
196 lines to generate disk images
75 lines to install grub on a disk image
652 lines of code total

Which is about half the size of vmdebootstrap 1/4th the size of partman-base (probably 1/100th the size of total partman), and 1/13th the size of live-build. All of which do similar things, in ways that seem to me to be much less flexible than Propellor.


One thing I'm considering doing is extending this so Propellor can use qemu-user-static to create disk images for eg, arm. Add some u-boot setup, and this could create bootable images for arm boards. A library of configs for various arm boards could then be included in Propellor. This would be a lot easier than running the Debian Installer on an arm board.

Oh! I only just now realized that if you have a propellor host configured, like this example for my dialup gateway, leech --

 leech = host "leech.kitenet.net"
        & os (System (Debian (Stable "jessie")) "armel")
        & Apt.installed ["linux-image-kirkwood", "ppp", "screen", "iftop"]
        & privContent "/etc/ppp/peers/provider"
        & privContent "/etc/ppp/pap-secrets"
        & Ppp.onBoot
        & hasPassword (User "root")
        & Ssh.installed

-- The host's properties can be extracted from it, using eg hostProperties leech and reused to create a disk image with the same properties as the host!

So, when my dialup gateway gets struck by lightning again, I could use this to build a disk image for its replacement:

 import qualified Propellor.Property.Hardware.SheevaPlug as SheevaPlug

    laptop = host "darkstar.kitenet.net"
        & SheevaPlug.diskImage "/srv/images/leech.img" (MegaBytes 2000)
            (& propertyList "has all of leech's properties"
                (hostProperties leech))

This also means you can start with a manually built system, write down the properties it has, and iteratively run Propellor against it until you think you have a full specification of it, and then use that to generate a new, clean disk image. Nice way to transition from sysadmin days of yore to a clean declaratively specified system.

concurrent output library

concurrent-output is a Haskell library I've developed this week, to make it easier to write console programs that do a lot of different things concurrently, and want to serialize concurrent outputs sanely.

It's increasingly easy to write concurrent programs, but all their status reporting has to feed back through the good old console, which is still obstinately serial.

Haskell illustrates problem this well with this "Linus's first kernel" equivilant interleaving the output of 2 threads:

> import System.IO
> import Control.Concurrent.Async
> putStrLn (repeat 'A') `concurrently` putStrLn (repeat 'B')
BABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABA
BABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABA
...

That's fun, but also horrible if you wanted to display some messages to the user:

> putStrLn "washed the car" `concurrently` putStrLn "walked the dog"
walwkaesdh etdh et hdeo gc
ar

To add to the problem, we often want to run separate programs concurrently, which have output of their own to display. And, just to keep things interesting, sometimes a unix program will behave differently when stdout is not connected to a terminal (eg, ls | cat).

To tame simple concurrent programs like these so they generate readable output involves a lot of plumbing. Something like, run the actions concurrently, taking care to capture the output of any commands, and then feed the output that the user should see though some sort of serializing channel to the display. Dealing with that when you just wanted a simple concurrent program risks ending up with a not-so-simple program.

So, I wanted an library with basically 2 functions:

outputConcurrent :: String -> IO ()
    
createProcessConcurrent :: CreateProcess -> IO whatever

The idea is, you make your program use outputConcurrent to display all its output, and each String you pass to that will be displayed serially, without getting mixed up with any other concurrent output.

And, you make your program use createProcessConcurrent everywhere it starts a process that might output to stdout or stderr, and it'll likewise make sure its output is displayed serially.

Oh, and createProcessConcurrent should avoid redirecting stdout and stderr away from the console, when no other concurrent output is happening. So, if programs are mostly run sequentially, they behave as they normally would at the console; any behavior changes should only occur when there is concurrency. (It might also be nice for it to allocate ttys and run programs there to avoid any behavior changes at all, although I have not tried to do that.)

And that should be pretty much the whole API, although it's ok if it needs some function called by main to set it up:

import Control.Concurrent.Async
import System.Console.Concurrent
import System.Process

main = withConcurrentOutput $
    outputConcurrent "washed the car\n"
        `concurrently`
    createProcessConcurrent (proc "ls" [])
        `concurrently`
    outputConcurrent "walked the dog\n"
$ ./demo
washed the car
walked the dog
Maildir/  bin/  doc/  html/  lib/  mail/  mnt/  src/  tmp/

I think that's a pretty good API to deal with this concurrent output problem. Anyone know of any other attempts at this I could learn from?

I implemented this over the past 3 days and 320 lines of code. It got rather hairy:

  • It has to do buffering of the output.
  • There can be any quantity of output, but program memory use should be reasonably small. Solved by buffering up to 1 mb of output in RAM, and writing excess buffer to temp files.
  • Falling off the end of the program is complicated; there can be buffered output to flush and it may have to wait for some processes to finish running etc.
  • The locking was tough to get right! I could not have managed to write it correctly without STM.

It seems to work pretty great though. I got Propellor using it, and Propellor can now run actions concurrently!

a tiling region manager for the console

Building on top of concurrent-output, and some related work Joachim Breitner did earlier, I now have a kind of equivilant to a tiling window manager, except it's managing regions of the console for different parts of a single program.

Here's a really silly demo, in an animated gif:

demo2.gif

Not bad for 23 lines of code, is that? Seems much less tedious to do things this way than using ncurses. Even with its panels, ncurses requires you to think about layout of various things on the screen, and many low-level details. This, by contrast, is compositional, just add another region and a thread to update it, and away it goes.

So, here's an apt-like download progress display, in 30 lines of code.

aptdemo.gif

Not only does it have regions which are individual lines of the screen, but those can have sub-regions within them as seen here (and so on).

And, log-type messages automatically scroll up above the regions. External programs run by createProcessConcurrent will automatically get their output/errors displayed there, too.

What I'm working on now is support for multiline regions, which automatically grow/shrink to fit what's placed in them. The hard part, which I'm putting the finishing touches on, is to accurately work out how large a region is before displaying it, in order to lay it out. Requires parsing ANSI codes amoung other things.

STM rules

There's so much concurrency, with complicated interrelated data being updated by different threads, that I couldn't have possibly built this without Software Transactional Memory.

Rather than a nightmare of locks behind locks behind locks, the result is so well behaved that I'm confident that anyone who needs more control over the region layout, or wants to do funky things can dive into to the STM interface and update the data structures, and nothing will ever deadlock or be inconsistent, and as soon as an update completes, it'll display on-screen.

An example of how powerful and beuatiful STM is, here's how the main display thread determines when it needs to refresh the display:

data DisplayChange
        = BufferChange [(StdHandle, OutputBuffer)]
        | RegionChange RegionSnapshot
        | TerminalResize (Maybe Width)
        | EndSignal ()

    ...
                change <- atomically $
                        (RegionChange <$> regionWaiter origsnapshot)
                                `orElse`
                        (RegionChange <$> regionListWaiter origsnapshot)
                                `orElse`
                        (BufferChange <$> outputBufferWaiterSTM waitCompleteLines)
                                `orElse`
                        (TerminalResize <$> waitwidthchange)
                                `orElse`
                        (EndSignal <$> waitTSem endsignal)
                case change of
                        RegionChange snapshot -> do
                ...
                        BufferChange buffers -> do
                ...
                        TerminalResize width -> do
                ...

So, it composes all these STM actions that can wait on various kinds of changes, to get one big action, that waits for all of the above, and builds up a nice sum type to represent what's changed.

Another example is that the whole support for sub-regions only involved adding 30 lines of code, all of it using STM, and it worked 100% the first time.


Available in concurrent-output 1.1.0.