Haskell-related posts to Joey's blog, oldest first.

on haskell

I saw the old house on Haskell St. last week for the first time in forver. I could barely pick it out of the other houses on the block, it's so different now.

I also, conincidentially, studied the haskell programming language a bit. I watched some talks that John Goerzen linked to, and I read several chapters of the "Gentle Introduction to Haskell". Which wasn't all that gentle. :-) I should have put off reading it until after the talks and after reading some other introductory examples.

At least I have some motivation to learn haskell. I could never see any reason to bother learning python or ruby, since there's so few compelling differences between them and perl. I had begun to worry that I wasn't learning any new programming languages, so it's nice to have a new dot on this graph of languages I've learned since 1990: graph

discussion

Posted
haskell baby steps

Since early August, I had forgotten about my plan to learn haskell. Today I refreshed my memory from the docs I read back then, and started writing some very simple programs. Reading a lot of docs and letting it gel for a few months seems to have worked fairly well, I was able to throw this program together in short order, and I think it's not too horrible.

putStrs = putStr.unlines
x s n = unwords $ take n $ repeat s
plot x s = (take (x-1) s) ++ "  " ++ (drop (x+1) s)
f x = truncate(40 + 38 * sin(x / 9))
line = "JOEY HESS" `x` 8 
main = putStrs [ plot x line | x <- map f [1..]]

If you're wondering what it does, the equivilant perl is:

#!/usr/bin/perl -lisubstr($_,39+38*sin++$y/9,2)=$s
for($s='  '||McQ;$_='JOEY HESS 'x8;print){eval$^I}

I suppose both demonstrate features of their language to some extent.


At first I assumed haskell would automatically convert the floating point value coming out of sin and I got a nasty suprise:

    No instance for (Floating Int)
      arising from use of `f' at foo.hs:17:21-26
    Possible fix: add an instance declaration for (Floating Int)

Adding the truncate fixed that, but figuring that out from the above error message and google took longer than figuring out how to write the rest of the program. :-(

discussion

Posted
a year of haskell (not really)

Guess it's really been longer than a year that haskell has been on my mind, though not much lately. Things are aligning again. I'll shortly be visiting the Bay Area again -- last time I piled up haskell documentation for the plane trip. This time I'm looking forward to the Real World Haskell book waiting in the mailbox when I get back.

I read and commented on the first several draft chapters, hoping my ignorance would be useful, and I know one of the authors (though if I'm not mistaken I've never met him). So I'm looking forward to reading it, but also feeling guilty that I haven't managed to do anything serious with haskell yet. No new project that I dared, or had the patience, to attempt in it. But that's what the book's supposed to solve. Getting over the gap from a basic understanding to being able to add the language as another tool in the kit.

And hey, it's better than seriously learning javascript would be, right?


Right now I can't think about haskell without thinking about Buddhism. I won't bore you with why they're connected in my head.

Posted
haskell and xmonad

I came back from Thanksgiving in California to find my copy of Real World Haskell had arrived. I've spent the last several days making my way through it up through chapter 10.

I'd read a few of the first chapters last year when I was struggling to learn Haskell, and posted enough comments that I'm amoung the (many many) people credited in the Preface. Amusing, since I still don't consider myself proficient in the language.

Reading back through those and on through Chapter Nine has swapped back in the basics I learned last year, and built on them. I was able to get all the way to Chapter Ten before hitting real brain-melting stumbling blocks. That's progess!

I can't help but feel that Chapter Ten should come later, perhaps after monads are covered. Only other quibbles so far are some typographical problems introduced in typesetting the book, and a few bugs and problems that were pointed out in comments on the web site, but not fixed in time for the first edition.


Anyway, I know more than enough Haskell now to configure xmonad. While I'm still perfectly happy with awesome, xmonad has some cool things going for it. Playing with it this evening, I especially like:

  • Being able to compile my xmonad.hs and rely on Haskell's type checking to make sure I wrote it correctly and it will work.
  • Xmonad's integration with gnome. This was a nice suprise, and being able to just use the gnome panel in xmonad, rather than building my own panel, has made it a lot easier to get started with.
    gnome panel showing xmonad workspaces
  • The power and beauty of xmonad's layouts. Start with the usual simple sorts of layouts like tiled, full screen and floating. These can be transformed in a myrid of ways (ie, rotated, decorations added), and can even be combined together to create more complex layouts. So I was quite easily able to configure it to alternate between a tabbed desktop layout that plays nicely with panels, and absolute unadorned full-screen mode:
    myLayout = desktop ||| noBorders Full
      where
        desktop = tabbed $ ewmhDesktopsLayout $ avoidStruts $ twoPane
        tabbed = addTabsAlways shrinkText myTheme
        twoPane = Tall 1 2/3 3/100
    
    (And it wasn't too hard to enhance this with a special-purpose grid layout for the workspace I run pidgin on.)
  • That my xmonad.hs is small, easy to understand, and doesn't contain any boilerplate beyond main = xmonad $ gnomeConfig.
  • It lets me muck about with doing something real with Haskell without having to commit to using it in some project of my own.
  • It supports cycling all windows on screen so that each in turn moves into the main viewport, via XMonad.Actions.RotSlaves.rotAllUp. An action that has been on my window manager wish-list for ages. Implemented in an easy 3 lines of code.

My current set of gripes with xmonad is small:

  • mod-shift-space is documented somewhere, but not where you need it.
  • I applied addTabsAlways to a Tall layout, and got title-bar like "tabs". But how do I add multiple windows as tabs to a given frame, like in ion? Can't figure out how, or even if it's possible.
  • Moving the mouse cursor over one of the above title-bar-like things doesn't focus its window. Very annoying!
  • Mouse support is very lacking. I'd like to be able to drag tabs around the screen, and maybe drag a border between frames to resize, couldn't find modules implementing any of that.
  • Theme configuration is only partly factored out, some of it is scattered in several different places.
  • Reloading the config file takes ages. 1.9 mb executable has to be built and run.
  • The lack of a dynamic language shows; there's no way to open a prompt and type Haskell into the window manager. I rarely speak lua to awesome or ion3, but it's nice to have the ability there. I think it should be possible to do this with Haskell (see ghci).
Posted
xmonad.hs for the Palm Pre

I've developed an xmonad configuration file for using this window manager with the Palm Pre smart phone. (I described how I run X on the Pre in VNC here.)

The tricky part is that the Pre has absolutely no keys or key combinations that can be stolen by xmonad and used as the mod key.

So at first I looked at using xmonad's support for gestures. That would be a perfect fit with the Pre. But, if taps are bound to gestures with no mod key, xmonad eats them all. Applications receive no taps! Not good.

I also tried using a submap. But, the key that triggers a submap never gets sent through to the application, even if none of the actions in the submap are triggered. Xmonad could handle these things better..

Instead, I had to use PalmVNC's support for sending special keys via a menu. So the mod "key" consists of tapping the Menu icon in the emulator (on the left hand side) and then pressing "a".

Thus, to start a new terminal, use "Menu a x"; to close a window use "Menu a c"; to change layout, use "Menu a space"; and you can read the rest of the bindings in the config file.

Posted
xmonad layouts for netbooks

How a window manager with tiling layouts makes a 1024x600 resolution livable.

I spend a lot of time in front of a screen, and for the past year I have used only my netbook. No desktops, no laptops. Just a cheap $200 computer that is nearly a kid's toy. The most challenging thing about this is dealing with the 1024x600 screen resolution. When I moved to this screen from something with many more pixels, it at first seemed very cramped.

After a year, I think that the most important part of coping with the small screen is the flexability and customizability of layouts provided by the Xmonad window manager. In this post I will explain a few layouts I have developed for fitting specific tasks onto the netbook screen.

(I assume that you know maybe 0.1% of Haskell (about what I do), and can read Haskell code without having a brain aneurysm.)

web browsing

It's important that the web browser have as few toolbars and other cruft as possible, as horizontal space is especially at a premium. I've configured both Epiphany and Firefox to put everything in one tool bar. But now I use Chromium, which comes pre-configured that way.

What the netbook is designed for. You just want a web browser, taking up the full screen, and with its own tabs. So far, so easy: that's Xmonad's Full layout.

But, you sometimes want to see two websites side-by-side. This layout accomplishes that, allowing the sizes to be adjusted as needed. It also uses Xmonad's Magnifier to zoom the smaller window when it's focused, which is useful if you briefly need to see more of a web site.

mySplit = magnifiercz' 1.4 $ Tall nmaster delta ratio
    where
        -- The default number of windows in the master pane
        nmaster = 1
        -- Percent of screen to increment by when resizing panes
        delta   = 3/100
        -- Default proportion of screen occupied by master pane
        ratio   = 60/100

screenshot     screenshot

web development

If you're doing web development, you'll want probably one big web browser window, but also with a nice wide terminal on the same screen, in order to see web server logs. A simple way to do this is to mirror Xmonad's Tall layout by 90 degrees.

myWide = Mirror $ Tall nmaster delta ratio
    where
        -- The default number of windows in the master pane
        nmaster = 1
        -- Percent of screen to increment by when resizing panes
        delta   = 3/100
        -- Default proportion of screen occupied by master pane
        ratio   = 80/100

The myWide layout uses the full screen width for a single terminal, and splits the width when there are more. This is sufficient for viewing logs and doing minor things at the shell prompt, in between testing the result in the web browser. Of course a terminal can be temporarily moved to the master area by pressing mod-return, if you need it to be larger.

screenshot     screenshot

chat

Tips on configuring Pidgin for a netbook: Set it up to use vertical tabs to save horizontal space. Configure the input area to only 1 line tall, and turn off as much other cruft as you can. The menu bars, sadly, cannot be disabled, nor can the excessively large borders. In my screenshots, you can see that stuff wasting space that could be used to show four more lines of text.

For chat, you want to put the buddy list on the side of the screen, and use the rest for conversation windows. Xmonad's IM module takes care of allocating a sidebar on the screen for the buddy window. The rest of the screen can be occupied by any layout you choose.

I like to have the conversation windows be as wide as possible, and typically only want to see two conversations at a time. But sometimes I might have a dozen visible. A good layout to cope with those needs is Grid.

myChat' l = withIM size roster l
    where
        -- Ratio of screen roster will occupy
        size = 1%5
        -- Match roster window
        roster = Title "Buddy List"
myChat = myChat' Grid

The withIM layout puts the buddy list on the left; I prefer it on the right for some reason, so I tweaked my layout to do that. All it took was using the reflectHoriz layout modifier to get a mirror image of the layout. Then I reflect the inner layout back to its normal orientation. Being able to throw in a few function calls and mutate a layout like that is where Xmonad shines.

myChat' l = reflectHoriz $ withIM size roster $
            reflectHoriz $ l

screenshot     screenshot

logs

Ever notice that tail -f wastes the last line of the terminal? On a netbook this matters. shorttail lets the last line be used.

I have a dedidated workspace that I use to tail logs, and as a place to send long-running tasks (such as compiles). The layout for this needs to keep windows wide, to see whole logged lines, but they need only be five or so lines tall. And it's convenient to have one bigger window with the small ones below. Xmonad has a perfect layout for this, called Dishes, because it's sorta like a stack of plates.

myDish = limitWindows 5 $ Dishes nmaster ratio
    where
        -- The default number of windows in the master pane
        nmaster = 1
        -- Default proportion of screen occupied by other panes
        ratio = 1/5

screenshot     screenshot

In the second screenshot above, I have 6 windows open, but only 5 are visible. The limitWindows 5 accomplishes this. It's handy mostly because gnome-terminal has a bad habit of crashing when resized to 0x0. (That's gotta be a bug in something!)

Notice that the screenshots above also have Procmeter in a sidebar on the right. I (ab)used the withIM layout to do that:

myLogs' l = reflectHoriz $ withIM procmeterSize procmeter $
              reflectHoriz $ l
    where
        -- Ratio of screen procmeter will occupy
        procmeterSize = 1%7
        -- Match procmeter
        procmeter = ClassName "ProcMeter3"
myLogs = myLogs' myDish

coding

Everything above was easy. Doing coding (or writing) on a small screen is where it gets hard. When I'm coding I want to have one window that is exactly 80 columns wide, and as tall as possible, where I do the main development. Then I need a minimum of two other windows also visible: one action window for running tests and the like, and one to view references (emails, man pages, other files, etc). I may need to view any of these windows fullscreen at any time, of course.

As an example of the three window rule: While I'm writing this post, I have a reference window open with my .xmonad/xmonad.hs, and an action window open where I am managing screenshots.

Often, deep into something, I will accumulate many other reference and action windows. But three seems to be the magic, minimum number for me; use of screen or anything else doesn't reduce it; if I can't see all three at once, I will waste time and concentration flipping back and forth.

The Xmonad layout I've developed to handle this is based on the handy FixedColumn layout, which automatically keeps the master window 80 columns wide. I modify this by adding the Magnifier, so that my action and reference windows "pop out" over the master window when focused.

myCode = limitWindows 3 $ magnifiercz' 1.4 $ FixedColumn 1 20 80 10

The 1.4 was carefully tuned, to magnify a window to be big enough to be usable for editing, reading, or command-line, while not obscuring too much of the other two windows. And again I use limitWindows, to only show three windows, in order to keep them all as big as possible. Other windows are hidden offscreen, but I can rotate to them with alt-tab as needed.

screenshot     screenshot

reading ebooks

When I'm reading an ebook, I want to have the reader in the middle of the screen. But a widescreen netbook like mine is too wide to comfortably read text all the way across, so it's nice to have margins. (In the margins, I put terminals, or notes.)

The ThreeColumns module has a layout that does just that.

myBook = ThreeColMid nmaster delta ratio
    where
        nmaster = 1
        -- Percent of screen to increment by when resizing panes
        delta   = 3/100
        -- Default proportion of screen occupied by master pane
        ratio   = 2/3

screenshot

putting it all together

I join all my layouts together, so Xmonad will allow switching between them. These three combinations all have the same layouts, only ordered differently, so that different workspaces can have different default layouts.

codeFirst  = myCode  ||| myWide  ||| myGrid  ||| myDish
dishFirst  = myDish  ||| myCode  ||| myWide  ||| myGrid
gridFirst  = myGrid  ||| myDish  ||| myCode  ||| myWide

Xmonad layouts can be decorated with things like title bars. To save horizontal space, I leave off the title bars on layouts where they are not needed to disambiguate windows. Two helper functions can be applied to add or remove titles (and also cause the layout to avoid the Gnome panel.)

withTitles l = noFrillsDeco shrinkText myTheme (desktopLayoutModifiers l)
noTitles l = desktopLayoutModifiers l

I have semi-dedicated workspaces; one for each of the above activities. To assign a layout to a workspace, I use the PerWorkspace module.

Notice that the "web" workspace has only two available layouts. Meanwhile, the "book" workspace always uses the specialized myBook layout; I don't use that layout elsewhere.

perWS = onWorkspace "logs" (withTitles $ myLogs' dishFirst) $
        onWorkspace "web"  (noTitles   $ (mySplit ||| myWide)) $
        onWorkspace "chat" (noTitles   $ myChat' gridFirst) $
        onWorkspace "book" (noTitles   $ myBook) $
                           (withTitles $ codeFirst)

Finally, I allow toggling between the currently selected layout and fullscreen mode, and use smartBorders to avoid displaying borders when there is only one window onscreen.

myLayout = smartBorders $ toggleLayouts Full perWS

Xmonad layout optimised for the small screen of a netbook: Done! (For now...)


  screenshots   screenshots   screenshots   screenshots   screenshots  


PS: Thanks to #xmonad regulars for always having the answer I need up their sleeves.

Posted
first real Haskell program

I've finally started writing my first real program in Haskell. I've been learning Haskell since 2007. That is to say, I've been reading about it, over and over.

Progress learning has been annoyingly slow because I have not been able to commit to using it in a real program. Because when I'm writing a real program I want the program now, and I can code in other languages ten to one hundred times faster than I can currently code in Haskell. Nasty viscious cycle. Probably best to avoid getting really good at specific programming languages to avoid it, in hindsight.

Writing a real program in Haskell is important because I mostly learn by doing real things. Exercises don't work well for me. Up till now the only real thing I did in Haskell was write a 250 line xmonad config file in it. Which really helped a lot, but mostly only with beating some of its syntax into my head.

I have a few hundred lines of git-annex done. I don't need this urgently so I am willing to spend a week writing it instead of dashing off a perl version tonight.

I seem to have had bad luck on the parts I tackled first, and run into some swampier parts of the Haskell platform. I hope these problems are behind me, they sure made the first 8 hours fun.

time

First thing I needed was a data structure with a time stamp. There are multiple modules to handle time, mostly seemingly incompatible, deprecated, and/or possibly broken.

Data.DateTime seems to be the current choice, but I don't entirely trust it. Check this weird thing out, it thinks that 0 seconds and 1 second after the Unix epoch are the same second.

Data.DateTime> map (toSeconds . fromSeconds) [-1,0,1,2,100,1000,100000,100000,1000000,100000000,100000000,1000000000,10000000000]
[-2,0,0,1,99,999,99999,99999,999999,99999999,99999999,1000000000,10000000000]

Rounding error?

(Now it's not all bad... it does support dates right up until the heat death of the universe, on 32 bit even. No Y2038 bug here.)

locking

Huh, I read Real_World_Haskell twice and never noticed that it seems to omit POSIX file locking. Which in my Real World is a necessity.

Probably because it's so hard in Haskell, with multiple gotchas.

First there's the lazy IO problem, which means that if you explicitly close a file, code that read from it may not have really run yet. I ended up needing this scariness from System.IO.Strict, which is sadly not packaged in Debian or part of the Haskell Platform:

hGetContentsStrict h  = hGetContents h >>= \s -> length s `seq` return s

Then there's the Haskell's interface for fcntl locking, which exposes rather more of fcntl locking than I like to think about, being spoiled by perl's 2-parameter interface to it.

waitToSetLock lockfd (ReadLock, AbsoluteSeek, 0, 0)

And then there's that lockfd, which is not a regalar file handle, but a fd number, which has to be obtained by calling handleToFd.

And then there's the crazy interface that has handleToFd close the input handle! Which is problimatic if you were going to use the file after you locked it..

announcing git-annex

First there was mr, then etckeeper. Now git-annex completes my triptych of software tools that use git in ways that were never intended.

Git's original tagline was "the stupid content tracker". Git-annex inverts that, by managing files with git, without checking their contents into git. While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, checksumming time, or disk space.

I've been working on git-annex for two weeks, and have been trusting my data to it for a week. Not a little data, but terabytes spread amoung multiple archival drives, file servers, laptops, and portable USB drives. In situations too complex for unison or rsync to be of much use. Applying git to this problem in an unconventional way has simplified things enormously.

At this point it should be very usable, and solid. A package will be uploaded to Debian shortly; or git clone it.

Posted
three thousand lines of Haskell

Three thousand lines of code is not a huge program, but it is enough to get a pretty good feel for a language. Now that I've completed my first real Haskell program I feel that I've gotten over several of the humps in the learning curve and am starting to get a good feel for it.

Actually, I've written closer to five thousand lines, since there were several big refactorings. One was when I stopped manually threading my program state around and added a StateT monad. I did know from the beginning I would need one, but it seemed easier and a better learning exercise to let the program start out with a vesigial tail and gills before growing up into a modern Haskell program. (I suppose it's still written in baby-Haskell, really..)

Another refactoring came when I realized I needed to use a custom data type, not String, to represent keys. That was a great experience in type-based refactoring. Being able to keep typing ':make' and landing on the next bit of code that needed fixing was great, and simply adding that one type exposed several non-obvious bugs.

I found myself writing code that is much more solid and reusable than normally comes easily. And yet it's also very malleable. Actually, pulling out better data types and abstractions can get a bit addictive.

When I realized that I had a similar three-stage control flow being used for each of git-annex's subcommands, and factored that control flow out into a function that used the 3 data types below, I felt I'd gone down that rabbit hole perhaps far enough for now.

type SubCmdStart = String -> Annex (Maybe SubCmdPerform)
type SubCmdPerform = Annex (Maybe SubCmdCleanup)
type SubCmdCleanup = Annex Bool

(That will allow for some nice parallelism later though, and removed dozens of lines of code, so was worth it.)

Since git-annex is a very Real World Haskell type program, there is a lot of impure code in it. I could probably do better at factoring out more pure code. I count 117 impure functions, and only 37 pure.

Anyhow, from my perspective of a long-time perl programmer, some other random impressions..

  • ghc --make is handy, but every time it spits out a new 13 mb executable I can feel my laptop's SSD groan!
  • It was surpisingly easy to get into nasty situations with recursive dependencies between the 19 haskell modules I wrote. Sometimes solving them was really messy. I lost hours to this. More time than I've lost to the problem in all other languages combined over 15 years. It's not clear to me if it was due to the overall design of my program, or if Haskell's types tend to encourage this problem. Or if there's some simple "please let me have recursive dependencies" switch to ghc that I missed..
  • I'm used to being able to use man to get at mutiple books worth of detailed documentation for perl, and work easily offline or with limited bandwidth. With Haskell, I spend much more time searching online for documentation than I an comfortable with (although Hoogle is pretty neat). And the haddock-produced documentation is often pretty sketchy. The saving grace is that the source to any library function is a click away, and tends to be very readable.
  • I'm used to being able to use pretty much any Unix syscall by name from perl: mkdir, chmod, rename, etc. In Haskell, there is a Windows smell to the names, like createDirectoryIfMissing and setPermissions. And there are pointless distinctions like renameFile vs renameDirectory. These long names are not memorable and I have to look them up every time. Most of POSIX is available, but it's scattered amoung many disparate libraries, and I can't find an interface for sysconf(3) at all. There is a certian temptation, that I am so far resisting, to make a library for C/perl refugees that exports the sane Unix names for everything.
  • Anything involving the IO monad, or probably most monads, has a certian level of syntactic clumsiness about it. Compare:
      if ($flag{foo} && length $l = <>) {
    
    vs
      foo <- getFlag "foo"
      l <- getLine
      if (foo && not $ null l)
          then do
    
    When writing lots of impure code, that got old, and while I could use ifM, or make up some other similar thing, its syntax would also be somewhat clumsy.
  • The fixity levels for a lot of stuff seems a bit off. I too often found myself writing error $ "foo: " ++ (show bar) or return $ Just $ ... (Still a lot better than Scheme thanks to $!)
  • I've leveled up a couple times now, but this particular video game seems to have more levels going up and up, forever. Can't even see the top from here!
Posted
debugging musings

Debugging a computer program is such an interesting activity because it's not really a matter of fixing a program. It's a matter of fixing your own understanding to the point that the cause of the bug becomes obvious. So debugging means constantly challenging your assumptions, constantly looking for the overlooked insignificant thing that turns out to be crucial.

I mostly debug by print statements. I don't like it much, but it works. I add some print statements around the problem area and see if what's going on matches what the model in my head thinks should be going on. If it doesn't, I try to fix the model, and if it does, I focus the print statements in tighter and tighter, deeper and deeper, until I finally reach the bug. It's nasty and ah-hoc, but it works really well, mostly. And I know I'm in good company doing it.

But that method can fail; I myself can end up caught in a loop. Like today, tracking down an obscure bug in Branchable's auditing subsystem, when I found myself at what was apparantly the point of a bug: Function A was supposed to call function B, but B never seemed to be called. So surely A was buggy, and I instrumented it to bits with print statements and stuff, and started wondering if the libraries it relied on were somehow breaking it in non-obvious ways before it could call B, or maybe there was a bug in the runtime language that was breaking the call to B.

Going deeper and deeper didn't help, because the thing I was overlooking was up at the top level -- one of the first print statements I inserted showed that all this code was running twice. Why? Didn't seem relevant to the bug so I ignored it. Of course what I was ignoring was a symptom of the real problem: The first run was by another subsystem, that happened to redefine function B temporarily..

Actually, one of the nice things about Haskell is that print statements are not very helpful in debugging pure code. Pure code tends to get loaded up in ghci and tested interactively, and the bug is then clear. Or a quickcheck test case gets written with an invariant that the pure code should satisfy. I spent a whole day a few weeks ago writing an inverse form of a buggy function in git-annex just so I could feed them both into quickcheck and automate debugging. (Most of that time was spent wrestling with unicode decomposition .. ugh.) Being used to getting down in the buggy code with print statements, that at the time felt rather like a waste of time unrelated to getting on with making my program work, but it was very much worth it since I got my program working 100% in even crazy edge cases, and got a test suite for free, too.

Most of my debugging of Haskell code so far has not involved bugs in pure code. The most memorable bug was actually a bug in ghc's IO manager, which I'd have never tracked down without Josh's help. The problem with finding bugs in the runtime environment or compiler is that after you've found enough over the years, you're more likely to go down false paths distrusting runtime environments, like I did with the bug today. The beginner's assumption is that the runtime environment just works, but I by now may have over-challenged that assumption. :)

Posted
watch me program for half an hour

In this screencast, I implement a new feature in git-annex. I spend around 10 minutes writing haskell code, 10 minutes staring at type errors, and 10 minutes writing documentation. A normal coding session for me. I give a play-by-play, and some thoughts of what programming is like for me these days.

git-annex coding in haskell.ogg (38 MB) | on vimeo

Not shown is the hour I spent the next day changing the "optimize" subcommand implemented here into "--auto" options that can be passed to git-annex's get and drop commands.

watched it all, liked it (54%)


watched some, boring (9%)


too long for me (4%)


too haskell for me (17%)


not interested (13%)


Total votes: 86
Posted
happy haskell hacker

There are certian things haskell is very good at, and I had the pleasure of using it for one such thing yesterday. I wanted to support expressions like find(1) does, in git-annex. Something like:

git-annex drop --not --exclude '*.mp3' --and \
    -\( --in usbdrive --or --in archive -\) --and \
    --not --copies 3

So, parens and booleans and some kind of domain-specific operations. It's easy to build a data structure in haskell that can contain this sort of expression.

{- A Token can either be a single word, or an Operation of an arbitrary type. -}
data Token op = Token String | Operation op
        deriving (Show, Eq)

data Matcher op = Any
        | And (Matcher op) (Matcher op)
        | Or (Matcher op) (Matcher op)
        | Not (Matcher op)
        | Op op
        deriving (Show, Eq)

(The op could just be a String, but is parameterized for reasons we'll see later.)

As command-line options, the expression is already tokenised, so all I needed to do to parse it is consume a list of the Tokens. The only mildly tricky thing is handling the parens right -- I chose to not make it worry if there were too many, or too few closing parens.

generate :: [Token op] -> Matcher op
generate ts = generate' Any ts
generate' :: Matcher op -> [Token op] -> Matcher op
generate' m [] = m
generate' m ts = uncurry generate' $ consume m ts

consume :: Matcher op -> [Token op] -> (Matcher op, [Token op])
consume m [] = (m, [])
consume m ((Operation o):ts) = (m `And` Op o, ts)
consume m ((Token t):ts)
        | t == "and" = cont $ m `And` next
        | t == "or" = cont $ m `Or` next
        | t == "not" = cont $ m `And` (Not next)
        | t == "(" = let (n, r) = consume next rest in (m `And` n, r)
        | t == ")" = (m, ts)
        | otherwise = error $ "unknown token " ++ t
        where
                (next, rest) = consume Any ts
                cont v = (v, rest)

Once a Matcher is built, it can be used to check if things match the expression the user supplied. This next bit of code almost writes itself.

{- Checks if a Matcher matches, using a supplied function to check
 - the value of Operations. -}
match :: (op -> v -> Bool) -> Matcher op -> v -> Bool
match a m v = go m
        where
                go Any = True
                go (And m1 m2) = go m1 && go m2
                go (Or m1 m2) = go m1 || go m2
                go (Not m1) = not (go m1)
                go (Op o) = a o v

And that's it! This is all nearly completly generic and could be used for a great many things that need support for this sort of expression, as long as they can be checked in pure code.

A trivial example:

*Utility.Matcher> let m = generate [Operation True, Token "and", Token "(", Operation False, Token "or", Token, "not", Operation False, Token ")"]
*Utility.Matcher> match (const . id) m undefined 
True

For my case though, I needed to run some IO actions to check if expressions about files were true. This is where I was very pleased to see a monadic version of match could easily be built.

{- Runs a monadic Matcher, where Operations are actions in the monad. -}
matchM :: Monad m => Matcher (v -> m Bool) -> v -> m Bool
matchM m v = go m
        where
                go Any = return True
                go (And m1 m2) = liftM2 (&&) (go m1) (go m2)
                go (Or m1 m2) =  liftM2 (||) (go m1) (go m2)
                go (Not m1) = liftM not (go m1)
                go (Op o) = o v

With this and about 100 lines of code to implement specific tests like --copies and --in, git-annex now supports the example at the top.

Just for comparison, find(1) has thousands of lines of C code to build a similar parse tree from the command line parameters and run it. Although I was surprised to see that it optimises expressions by eg, reordering cheaper tests first.

The last time I wrote this kind of thing was in perl, and there the natural way was to carefully translate the expression into perl code, which was then evaled. Meaning the code was susceptible to security holes.

Anyway, this is nothing that has not been done a hundred times in haskell before, but it's very nice that it makes it so clean, easy, and generic.

Posted
announcing github-backup

Partly as a followup to a Github survey, and partly because I had a free evening and the need to write more haskell code, any haskell code, I present to you, github-backup.

github-backup is a simple tool you run in a git repository you cloned from Github. It backs up everything Github knows about the repository, including other forks, issues, comments, milestones, pull requests, and watchers.

This is all stored in the repository, as regular files, on a "github" branch.

Available in Cabal now, in Debian maybe if someone packages haskell-github.

unicode ate my homework

I've just spent several days trying to adapt git-annex to changes in ghc 7.4's handling of unicode in filenames. And by spent, I mean, time withdrawn from the bank, and frittered away.

In kindergarten, the top of the classrom wall was encircled by the aA bB cC of the alphabet. I'll bet they still put that up on the walls. And all the kids who grow up to become involved with computers learn that was a lie. The alphabet doesn't stop at zZ. It wouldn't all fit on a wall anymore.

So we're in a transition period, where we've all learnt deeply the alphabet, but the reality is much more complicated. And the collision between that intuitive sense of the world and the real world makes things more complicated still. And so, until we get much farther along in this transition period, you have to be very lucky indeed to not have wasted time dealing with that complexity, or at least having encountered Mojibake.

Most of the pain centers around programming languages, and libraries, which are all at different stages of the transition from ascii and other legacy encodings to unicode.

  • If you're using C, you likely deal with all characters as raw bytes, and rely on the backwards compatability built into UTF-8, or you go to long lengths to manually deal with wide characters, so you can intelligently manipulate strings. The transition has barely begin, and will, apparently, never end.
  • If you're using perl (at least like I do in ikiwiki), everything is (probably) unicode internally, but every time you call a library or do IO you have to manually deal with conversions, that are generally not even documented. You constantly find new encoding bugs. (If you're lucky, you don't find outright language bugs... I have.) You're at a very uncomfortable midpoint of the transition.
  • If you're using haskell, or probably lots of other languages like python and ruby, everything is unicode all the time.. except for when it's not.
  • If you're using javascript, the transition is basically complete.

My most recent pain is because the haskell GHC compiler is moving along in the transition, getting closer to the end. Or at least finishing the second 80% and moving into the third 80%. (This is not a quick transition..)

The change involves filename encodings, a situation that, at least on unix systems, is a vast mess of its own. Any filename, anywhere, can be in any encoding, and there's no way to know what's the right one, if you dislike guessing.

Haskell folk like strongly typed stuff, so this ambiguity about what type of data is contained in a FilePath type was surely anathama. So GHC is changing to always use UTF-8 for operations on FilePath. (Or whatever the system encoding is set to, but let's just assume it's UTF-8.)

Which is great and all, unless you need to write a Haskell program that can deal with arbitrary files. Let's say you want to delete a file. Just a simple rm. Now there are two problems:

  1. The input filename is assumed to be in the system encoding aka unicode. What if it cannot be validly interpreted in that encoding? Probably your rm throws an exception.
  2. Once the FilePath is loaded, it's been decoded to unicode characters. In order to call unlink, these have to be re-encoded to get a filename. Will that be the same bytes as the input filename and the filename on disk? Possibly not, and then the rm will delete the wrong thing, or fail.

But haskell people are smart, so they thought of this problem, and provided a separate type that can deal with it. RawFilePath hearks back to kindergarten; the filename is simply a series of bytes with no encoding. Which means it cannot be converted to a FilePath without encountering the above problems. But does let you write a safe rm in ghc 7.4.

So I set out to make something more complicated than a rm, that still needs to deal with arbitrary filename encodings. And I soon saw it would be problimatic. Because the things ghc can do with RawFilePaths are limited. It can't even split the directory from the filename. We often do need to manipulate filenames in such ways, even if we don't know their encoding, when we're doing something more complicated than rm.

If you use a library that does anything useful with FilePath, it's not available for RawFilePath. If you used standard haskell stuff like readFile and writeFile, it's not available for RawFilePath either. Enjoy your low-level POSIX interface!

So, I went lowlevel, and wrote my own RawFilePath versions of pretty much all of System.FilePath, and System.Directory, and parts of MissingH and other libraries. (And noticed that I can understand all this Haskell code.. yay!) And I got it close enough to working that, I'm sure, if I wanted to chase type errors for a week, I could get git-annex, with ghc 7.4, to fully work on any encoding of filenames.

But, now I'm left wondering what to do, because all this work is regressive; it's swimming against the tide of the transition. GHC's change is certainly the right change to make for most programs, that are not like rm. And so most programs and libraries won't use RawFilePath. This risks leaving a program that does a fish out of water.

At this point, I'm inclined to make git-annex support only unicode (or the system encoding). That's easy. And maybe have a branch that uses RawFilePath, in a hackish and type-unsafe way, with no guarantees of correctness, for those who really need it.


Previously: unicode eye chart wanted on a bumper sticker abc boxes unpacking boxes

Posted
more on ghc filename encodings

My last post missed an important thing about GHC 7.4's handling of encodings for FileName. It can in fact be safe to use FilePath to write a command like rm. This is because GHC internally uses a special encoding for FilePath data, that is documented to allow "arbitrary undecodable bytes to be round-tripped through it". (It seems to do this by encoding the undecodable bytes as very high unicode code points.) So, when presented with a filename that cannot be decoded using utf-8 (or whatever the system encoding is), it still handles it, and using the resulting FilePath will in fact operate on the right file. Whew!

Moral of the story is that if you're going to be using GHC 7.4 to read or write filenames from a pipe, or a file, you need to arrange for the Handle you're reading or writing to use this special encoding too. I use this to set up my Handles:

import System.IO
import GHC.IO.Encoding
import GHC.IO.Handle

fileEncoding :: Handle -> IO ()
fileEncoding h = hSetEncoding h =<< getFileSystemEncoding

Even if you're only going to write a FilePath to stdout, you need to do this. Otherwise, your program will crash on some filenames! This doesn't seem quite right to me, but I hesitate to file a bug report. (And this is not a new problem in GHC anyway.) If I did, it would have this testcase:

# touch "me¡"
# LANG=C ghc
Prelude> :m System.Directory
Prelude System.Directory> mapM_ putStrLn =<< getDirectoryContents "."
me*** Exception: <stdout>: hPutChar: invalid argument (invalid character)

Since git-annex reads lots of filenames from git commands and other places, I had to deal with this extensively. Unfortunatly I have not found a way to read Text from a Handle using the fileSystemEncoding. So I'm stuck with slow Strings. But, it does seem to work now.


PS: I found a bug in GHC 7.4 today where one of those famous Haskell immutable values seems to get well, mutated. Specifically a [FilePath] that is non-empty at the top of a function ends up empty at the bottom. Unless IO is done involving it at the top. Really. Hope to develop a test case soon. Happily, the code that triggered it did so while working around a bug in GHC that is fixed in 7.4. Language bugs.. gotta love em.

Posted
case study: adding box.com support to git-annex

git-annex has special remotes that allow large files checked into git to be stored in arbitrary places, that are not proper git remotes. One key use of the special remotes is to store files in The Cloud.

Until now the flagship special remote used Amazon S3, although a few other things like Archive.org, rsync.net, and Tahoe-Laffs can be made to work too. One of my goals is to add as many cloud storage options to git-annex as possible.

Box.com came to my attention because they currently have a promotion that provides 50 gigabytes of free "lifetime" service. Which is a nice amount of cloud storage to have for free. I decided that I didn't want to spend more than 4 hours of my time to make git-annex use it though. (I probably have spent a week on the S3 support by contrast.)

So, this is a case study in quickly adding support for one cloud storage provider to git-annex.

  • First, I had to sign up to box.com. Their promotion requires an android phone be used to get the 50 gigabytres. This wasted about an hour getting my unused phone dusted off etc. This also includes time spent researching ways to access box.com's storage, including reading their API documentation. I found it has a WebDAV interface.
  • Sadly, there is not yet a native WebDAV library for haskell. This is a shame, because it would make the implementation better. But, I'm confident someone will eventually write one. My experience with haskell libraries for other web APIs (S3, GitHub) is that it's an excellent language to write them in, the code tends to be very simple, concise and clear. But I can't do it in 4 hours. So for now, the workaround is to use a WebDAV mounting tool. I picked davfs2 as it was the first one I got to work with box.com's slightly broken WebDAV. 2 hours spent now.
  • With box.com mounted, I was neary done; git-annex's directory special remote can use the mount point. But there was a catch: box.com only allows up to 100 mb large files. I spent 1 hour or so adding support to the directory special remote for chunking files into a user-specified size.
    This was a fairly complex problem -- the existing code had a ByteString that when accessed lazily read the whole large file (from disk or from gpg, depending), and just called writeFile on it.
    I needed to still consume it lazily to avoid reading the whole file into memory, but write out chunks. This gets a bit into haskell's ByteString internals, but they're very well suited to this kind of thing, and so after 15 minutes familiarizing myself with the data structures, it was actually fairly easy to write the code. patch
  • I spent my last hour testing and tuning the box.com special remote. Using davfs2 as a quick fix caused some technical debt that I had to make up for. In particular, the chunked filename retrieval code had to make sure not to open every chunk at once, because that makes davfs2 try to cache them all, instead of streaming one at a time. patch
  • Not counted toward my 4 hour limit is the ... er ... 4 hours I spent last night adding a progress bar to the directory special remote. A progress display while transferring the files makes using box.com as a special remote much nicer, but also makes using my phone's SD card as a special remote much nicer! This is why I'm a poor consultant -- when faced with something generic and generally useful like this, I have difficulty billing for it.

The end result is that there are detailed instructions for using box.com as a special remote.

And it seems to work quite well now. I just set up my production box.com special remote. All content written to it is gpg encrypted, and various of my computers have access to it, each using their own gpg key to decrypt the files uploaded by the others. (git-annex's encryption feature makes this work really well!)

So..
There is a DropBox API for haskell. But as I'm not a customer, the 2 gb free account hardly makes it worth my while to make git-annex use it. Would someone like to fund my time to add a dropbox special remote to git-annex?

Posted
are there any gotchas with enabling the -threaded GHC runtime?

That's a seemingly simple question I started asking various people two weeks ago. I didn't get many useful answers, but now I have experience doing it myself, and so here's a blog post brain dump.

I have been trying to convert git-annex to use GHC's threaded runtime, for a variety of reasons. Naively adding the -threaded option resulted in a git-annex command that seemed to randomly freeze, but only sometimes (and, infuriatingly, never when I straced it), and a test suite that froze at a random point almost every time I ran it. Not a good result, and lacking any knowledge about gotchas with using the threaded runtime, I was at a loss for a long time (most of DebConf) about how to fix it.

I now know of at least three classes of problems that enabling the threaded runtime can turn up in programs that have been developed using the non-threaded runtime.

accessing a MVar after forkprocess can hang

MissingH has some code similar to this, which works ok with the non-threaded runtime:

forkProcess $ do
    debugM "about to run some command"
    executeFile ...

In the above example, debugM accesses a MVar. Doing that after forkProcess can result in a MVar deadlock, as it tries to access a MVar value, that is, apparently, not accessible to the forked process. (Bug report with test case)

So, using System.Cmd.Utils from MissingH is asking for trouble. I switched all my code to the newer and, apparently, threaded runtime safe System.Process.

forkProcess is a massively bad idea

Even when not accessing a MVar after forkProcess, it's very unsafe to use. It's easy to miss the warning attached to forkProcess, when the code seems to work. But with the threaded runtime, I've found that most any call to forkProcess will sometimes succeed, and sometimes freeze the whole program. This might only happen around 1 time in 1000. Then you'll find this warning and do a lot of head-scratching about what it really means:

forkProcess comes with a giant warning: since any other running threads are not copied into the child process, it's easy to go wrong: e.g. by accessing some shared resource that was held by another thread in the parent.

The hangs I saw could be due to laziness issues deferring code to run after the forkProcess that you'd expect to have run before it ... or who knows what else.

It's not clear to me that it's possible to use forkProcess safely in Haskell code. I think it's notable that System.Process runs the whole fork/exec in C code instead.

unsafe FFI calls block

According to most of the documentation you'll find in eg, the Haskell wiki, Real World Haskell, etc, the only difference between the safe and unsafe imports in the FFI is that unsafe is faster, and shouldn't be used for C code that calls back into Haskell code.

But the documentation is out of date. Actually, if you're using the FFI, and the foreign function can block, you need to use safe. When using unsafe, a blocking foreign function can block all threads of the program.

In my case, I was using kqueue to wait for changes to files, and this indeed blocked my whole program when linked with -threaded. Marking it safe fixed this.

The details are well described in this paper: http://community.haskell.org/~simonmar/papers/conc-ffi.pdf

Somewhat surprisingly, this blocking only happens when using the threaded runtime. If you're using the non-threaded runtime with unsafe blocking FFI functions, your other pseudo-threads won't be blocked. This is because the non-threaded runtime has an SIGALARM timer that interrupts (most) blocking system calls. This leads to other troubles of its own (like needing to restart interrupted FFI functions, or blocking the other pseudo-threads from running if the C code ignores SIGALARM), but that's offtopic for this post.

summary

Converting a large Haskell code base from the default, non-threaded runtime to the threaded runtime can be quite tricky. None of the problems are the sort of thing that Haskell helps to manage either. I will be developing new programs using the threaded runtime from the beginning from now on.

By the way, don't take this post to say that threading in Haskell sucks. I've really been enjoying writing threaded Haskell code. The control Haskell gives over isolation of state to threads, and the excellent and wide-ranging suite of thread communications data types (MVar, Chan, QSemN, TMVar, SampleVar, etc) have made developing a complex threaded program something I feel comfortable doing for the first time, in any language.

Posted
build a dynamic webapp in Yesod in just three days

This screencast shows what I built. Scroll down for my Yesod braindump.

video link

I've been astonished how quickly this went together. This is my first time using any sort of web framework, and I used a still unusual one, Yesod. It's my first time using Bootstrap too. It's also the first time I've done any AJAX programming!

Bootstrap was something I'd heard of and seen some suspiciously similar looking sites built with, but it was really a pleasant surprise. Being able to make things that work and look good on the web without struggling with CSS is such a nice change. For the first time, it makes the web feel like a UI toolkit to me.

As I've learned Yesod, I kept seeing problems the web framework solves that I either solved in an ad-hoc basis in Ikiwiki, or didn't realize were problems. In the former group are things like having a consistent way to store a URL to redirect to after an action (such as logging in) finishes, and generating urls programmatically in ways that avoid broken links. In the latter group are things like exploiting types to guarantee mistakes can't be made. And also the abstraction of widgets, which combine html, css, and javascript, in a way that can be composed with other widgets.

Overall, I'm really enjoying Yesod, and it's making me productive in new ways.

I also see a lot of potential in Yesod to improve from where it is.

  • Probably the biggest improvement would be generating javascript from haskell code, so the type safety extends to the javascript side.

    There have been many tries at doing this, the first one I'm really tempted to use is Fay. Rather than build a heavy haskell runtime on top of javascript, it essentially uses javascript as its runtime, and offloads type checking to GHC. This lets it generate javascript code that is short and often comprehensible enough to give insights into what the original haskell code does.

    I'm betting this will be integrated into Yesod eventually. They have an active wiki page about it.

  • There's a WAI library for building local webapps with Yesod, but it was not suitable for my needs (for one thing, it lacks security; for another it kills the haskell program when the web page is closed); so I built my own webapp library. A problem with my current pace of development is that I'm building lots of reusable libraries, but I don't have the time to stabalize them and make them generally available. That one goes in the pile of 2k+ lines of such code.

  • Yesod needs a version of the Hamlet markup that can be edited by people who only understand html. That means it should allow closing tags, and tabs, and not have nasty whitespace gotchas. I think this would be easy to build, starting from Hamlet. It could be called "Hecate".. I don't have time right now.

  • The compile time error messages are often beyond atrocious. Seriously, I'm tempted to write a filter to filter out common patterns where there's one line about a syntax error in a Hamlet file sandwitched in between 150 lines of type error gobbly-gook and generated code.

  • Some really nice things could be done integrating Yesod with Bootstrap. Like the ability to build entire sites by just composing together Bootstrap components, with no need to write a single line of html or css. I'm very tempted to try to build this library.

 webpage = bootstrap Dynamic $ do
        setTitle "FooCorp"
        login <- getLogin
        navbar [FixedTop] $ do
            brand "FooCorp"
            link AboutR
            link BlogR
            nav [PullRight] $
                link . maybe LoginR ProfileR login
        div [ContainerFluid] $ content login
        where
            content Nothing = heroUnit $ do
                para $ "We're the FooCorp for you."
                button "Register Today" [Primary, Large] SignUpR
                carousel
                    [ amazingFeatures
                    , aboutFooCorp
                    , pricing
                    ]
            content (Just user)  = do
                para $ "Welcome back " ++ name user ++ "!"
                showProfile user

Posted
hledger

Apologies in advance for writing a long blog post about the dull and specialised subject of double-entry accounting from the Unix tools perspective, that ends up talking about Monads to boot. I can't believe I'm about to write such a thing, but I'm finding it an oddly interesting subject.

double-entry accounting

I've known of, though probably not really understood double entry accounting for a while, thanks to GnuCash. I think GnuCash did something rather special in making such a subject approachable to the layman, and I've been happy to recommend GnuCash to friends. I was stoked to find a chapter in my sister Anna's new book that happily and plausibly suggests readers use GnuCash.

But for my personal use, GnuCash is clunky. I mean, I'm a programmer, but I can't bring any of my abilities to bear on it, without a deep dive into the code. The file format is opaque (and a real pain to keep checked into git with all the log files); the user interface is often confusing, but there's no benefit to its learning curve, it never seems to get better than manually entering data into its ledger, or perhaps importing data from a bank. I've never found the reports very useful.

I've got perhaps a year of data in GnuCash, but it's fragmented and incomplete and not something I've been able to motivate myself to keep up with. So I have less financial data than I'd like. I'm hoping ledger will change that.

ledger

I've known about ledger for a while, at least since This Linux Weekly News article. It's a quintessential Unix tool, that simply processes text files. The genius of it is the simplicity of the file format, that gets the essence and full power of double entry bookeeping down to something that approaches a meme. Once you get the file format stuck in your head, you're done for.

2004/05/27 Book Store
      Expenses:Books                 $20.00
      Liabilities:Visa

starting to use hledger

Now as a Haskell guy, I was immediately drawn to the Haskell clone, hledger. It's nice that there are two (mostly) compatable implementations of ledger too. So from here on, I'll be talking about hledger.

I sat down and built a hledger setup for myself the other day. I started by converting the GnuCash books I have been keeping up-to-date, for a small side business (a rental property). It quickly turned into something like programming, in the best way, as I used features like:

  • Include directives, so I can keep my business data in its own file, while pulling it into my main one.
  • Simple refactorings, like putting "Y 2012" at the top, so I don't have to write the year in each transaction.
  • Account aliases, so I can just type "rent", rather than "income:rental" and "repairs:contractor" rather than "expenses:home repair:contractor"
  • All the power of my favorite editor.

a modern unix program

While I've been diving into hledger, I've been drawing all kinds of parallels between it and other modern Unix-friendly programs I use lately. I think we've gone over a bit of a watershed recently. Unix tools used to be either very simple and crude (though often quite useful), or really baroque and complex with far too many options (often due to 10-30 years of evolution). Or they were a graphical user interface, like GnuCash, and completely divorced from Unix traditions.

The new unix programs have some commonalities...

  • They're a single command, with subcommands. This keeps the complexity of doing any one thing down, and provides many opportunities for traditional unix tools philosophy, without locking the entire program into being a single-purpose unix tool.

    hledger's subcommands range from querying and reports, to pipable print, to a interactive interface.

  • They have a simple but powerful idea at their core, that can be expressed with a phrase like "double entry accounting is a simple file format" (ledger), or "files, trees, commits" (git).

    Building a tool on a central idea is something I strive to do myself. So the way ledger manages it is particularly interesting to me.

  • They are not afraid to target users who have a rather special kind of literacy (probably the literacy you need to have gotten to here in this post). And reward them with a lot of power.

    Ledger avoids a lot of the often confusing terminology around accounting, and assumes a mind primed with the Unix tools philosophy.

  • If there's a GUI, it's probably web based. There's little trust in traditional toolkits having staying power, and the web is where the momentum is. The GUI is not the main focus, but does offer special features of its own.

    hledger's web UI completely parallels what I've doing with the git-annex webapp, right down to being implemented using Yesod -- which really needs to be improved to use some methods I've developed to make it easier to make webapps that integrate with the desktop and are more secure, if there are going to be a lot more programs like this using it.

importing data

After manually converting my GnuCash data, I imported all my PayPal history into hledger. And happily it calculates the same balance Paypal does. It also tells me I've paid PayPal $180 in transaction fees over the years, which is something PayPal certianly doesn't tell you on their website. (However, my current import from PayPal's CSV files is a hackish, and only handles USD currency, so I miss some currency conversion fees.)

I also imported my Amazon Payments history, which includes all the Kickstarter transactions. I almost got this to balance, but hledger and Amazon disagree about how many hundreths of a cent remain in my account. Still, pretty good, and I know how much I paid Amazon in fees for my kickstarter, and how much was kicked back to Kickstarter as well. (Look for a cost breakdown in some future blog post.)

At this point, hledger stats says I have 3700 transactions on file, which is not bad for what was just a few hours work yesterday.

One problem is hledger's gotten a little slow with this many transactions. It takes 5 seconds to get a balance. The ledger program, written in C, is reportedly much faster. hledger recently had a O(n^2) slowdown fixed, which makes me think it's probably only starting to be optimised. With Haskell code, you can get lucky and get near C (language, not speed of light) performace without doing much, or less lucky and get not much better than python performance until you dive into optimising. So there's hope.

harnessing haskell

If there's one place hledger misses out on being a state of the art modern Unix program, it's in the rules files that are used to drive CSV imports. I found these really hard to use; the manual notes that "rules file parse errors are not the greatest"; and it's just really inflexible. I think the state of the art would be to use a Domain Specific Language here.

For both my Amazon and PayPal imports I had CVS data something like:

date, description, amount, fees, gross

I want to take the feeds into account, and make a split transaction, like this:

date description
    assets:accounts:PayPal             $9.90
    expenses:transaction fees:PayPal   $0.10
    income:misc:PayPal                 -$10.00

This does not seem possible with the rules file. I also wanted to combine multiple CVS lines, to do with currency conversions, into a single transaction, and couldn't.

The problem is that the rules file is an ad-hoc format, not a fully programmable one. If instead, hledger's rules files were compiled into standalone haskell programs that performed the import, arbitrarily complicated conversions like the above could be done.

So, I'm thinking about something like this for a DSL.. I doubt I'll get much further than this, since I have a hacked together multi-pass importer that meets my basic needs. Still, this would be nice, and being able to think about adding thing kind of thing to a program cleanly is one of the reasons I reach for the Haskell version when possible these days.

First, here's part of one of my two paypal import rules files (the other one extracts the transaction fees):

amount-field 7
date-field 0
description-field %(3) - %(4) - %(11)
base-account assets:accounts:PayPal

Bank Account
assets:accounts:checking

.*
expenses:misc:PayPal

That fills out the basic fields, and makes things with "Bank Account" in their description be categorised as bank transfers.

Here's how it'd look as Haskell, carefully written to avoid the $ operator that's more than a little confusing in this context. :)

main :: IO ()
main = convert paypalConveter

paypalConverter :: [CSVLine] -> [Maybe Transaction]
paypalConverter = map convert
  where
    convert = do
        setAmount =<< field 7
        setDate =<< field 0
        setDescription =<< combine " - " [field 3, field 4, field 11]
        defaultAccounts
            "assets:accounts:PayPal" ==> "expenses:misc:PayPal"
        inDescription "Bank Account" ?
            "assets:accounts:PayPal" ==> "assets:accounts:checking"

That seems like something non-haskell people could get their heads around, especially if they didn't need to write the boilerplate function definitions and types at the top. But I may be biased. :)

Internally, this seems to be using a combination Reader and Writer monad that can get at fields from a CSV line and build up a Transaction. But I really just made up a simple DSL as I went along and thew in enough syntax to make it seem practical to implement. :)

Of course, a Haskell programmer can skip the monads entirely, or use others they prefer. And could do arbitrarily complicated stuff during imports, including building split transactions, and combining together multiple related CVS lines into a single transaction.

Posted
no longer a perl programmer

This year, I've gradually realized that I no longer identify as a perl programmer. For a decade and a half, perl was the language I reached for to solve any problem that didn't have a good reason to be solved in some other language. Now I only reach for it in the occasional one-liner -- and even then I'm more likely to find myself in ghci and end up with a small haskell program.

I still maintain plenty of perl code, but even when I do, I'm not thinking in perl, but traslating from some internal lambda calculus. There's quite a few new-ish perl features that I have not bothered to learn, and I can feel some of the trivia that perl encourages be kept in mind slipping gradually away. Although the evil gotchas remain fresh in my mind!

More importantly, my brain's own evaluation of code has changed; it doesn't evaluate it imperatively (unless forced to by an appropriate monad), but sees the gesalt, sees the data flow, and operates lazily and sometimes, I think in parallel. The closest I can come to explaining the feeling is how you might feel when thinking about a shell pipeline, rather than a for loop.

Revisiting some of my older haskell code, I could see the perl thinking that led to it. And rewriting it into pure, type-driven, code that took advantage of laziness for automatic memoization, I saw, conclusively that the way I think about code has changed. (See the difference for yourself: before after )

I hear of many people who enjoy learning lots of programming languages, one after the other. A new one every month, or year. I suspect this is a fairly shallow learning. I like to dive deep. It took me probably 6 years to fully explore every depth of perl. And I never saw a reason to do the same with python or ruby or their ilk; they're too similar to perl for it to seem worth the bother. Though they have less arcania in their learning curves and are probably better, there's not enough value to redo that process. I'm glad haskell came along as a language that is significantly different enough that it was worth learning. The deep dive for haskell goes deep indeed. I'm already 5 years in, and have more to learn now than I ever did before.

I'm glad I didn't get stuck on perl. But I may be stuck on haskell now instead, for the foreseeable future. I'd sort of like to get really fluent in javascript, but only as a means to an end -- and haskell to javascript compilers are getting sufficiently good that I may avoid it. Other than that, I sense adga and coq beckoning with their promises of proof. Perhaps one of these years.

Of course if Bradley Kuhn is right and perl is the new cobol, I know what I'll be doing come the unix rollover in 2038. ;)

Posted
Template Haskell on impossible architectures

Imagine you had an excellent successful Kickstarter campaign, and during it a lot of people asked for an Android port to be made of the software. Which is written in Haskell. No problem, you'd think -- the user interface can be written as a local webapp, which will be nicely platform agnostic and so make it easy to port. Also, it's easy to promise a lot of stuff during a Kickstarter campaign. Keeps the graph going up. What could go wrong?

So, rather later you realize there is no Haskell compiler for Android. At all. But surely there will be eventually. And so you go off and build the webapp. Since Yesod seems to be the pinnacle of type-safe Haskell web frameworks, you use it. Hmm, there's this Template Haskell stuff that it uses a lot, but it only makes compiles a little slow, and the result is cool, so why not.

Then, about half-way through the project, it seems time to get around to this Android port. And, amazingly, a Haskell compiler for Android has appeared in the meantime. Like the Haskell community has your back. (Which they generally seem to.) It's early days and rough, lots of libraries need to be hacked to work, but it only takes around 2 weeks to get a port of your program that basically works.

But, no webapp. Cause nobody seems to know how to make a cross-compiling Haskell compiler do Template Haskell. (Even building a fully native compiler on Android doesn't do the trick. Perhaps you missed something though.)

At this point you can give up and write a separate Android UI (perhaps using these new Android JNI bindings for Haskell that have also appeared in the meantime). Or you can procrastinate for a while, and mull it over; consider rewriting the webapp to not use Yesod but some other framework that doesn't need Template Haskell.

Eventually you might think this: If I run ghc -ddump-splices when I'm building my Yesod code, I can see all the thousands of lines of delicious machine generated Haskell code. I just have to paste that in, in place of the Template Haskell that generated it, and I'll get a program I can build on Android! What could go wrong?

And you even try it, and yeah, it seems to work. For small amounts of code that you paste in and carefully modify and get working. Not a whole big, constantly improving webapp where every single line of html gets converted to types and lambdas that are somehow screamingly fast.

So then, let's automate this pasting. And so the EvilSplicer is born!

That's a fairly general-purpose Template Haskell splicer. First do a native build with -ddump-splices output redirected to a log file. Run the EvilSplicer to fix up the code. Then run an Android cross-compile.

But oh, the caveats. There are so many ways this can go wrong..

  • The first and most annoying problem you'll encounter is that often Template Haskell splices refer to hidden symbols that are not exported from the modules that define the splices. This lets the splices use those symbols, but prevents them being used in your code.

    This does not seem like a good part of the Template Haskell design, to be honest. It would be better if it required all symbols used in splices to be exported.

    But it can be worked around. Just use trial and error to find every Haskell library that does this, and then modify them to export all the symbols they use. And after each one, rebuild all libraries that depend on it.

    You're very unlikely to end up with more than 9 thousand lines of patches. Because that's all it took me..

  • The next problem (and the next one, and the next ...) is that while GHC's code output by -dump-splices (and indeed, by GHC error messages, etc) looks like valid Haskell code to the casual viewer, it's often not.

    To start with, it often has symbols qualified with the package and module name. ghc-prim:GHC.Types.: does not work well where code originally contained :.

    And then there's fun with multi-line strings, which sometimes cannot be parsed back in by GHC in the form it outputs them.

    And then there's the strange way GHC outputs case expressions, which is not valid Haskell at all. (It's missing some semicolons.)

    Oh, and there's the lambda expressions that GHC outputs with insufficient parentheses, leading to type errors at compile time.

    And so much more fun. Enough fun to give one the idea that this GHC output has never really been treated as code that could be run again. Because that would be a dumb thing to need to do.

  • Just to keep things interesting, the Haskell libraries used by your native GHC and your Android GHC need to be pretty much identical versions. Maybe a little wiggle room, but any version skew could cause unwanted results. Probably, most of the time, unwanted results in the form of a 3 screen long type error message.

    (My longest GHC error message seen on this odyessy was actually a full 500+ kilobytes in size. It included the complete text of Jquery and Bootstrap. At times like these you notice that GHC outputs its error messages o.n.e . c.h.a.r.a.c.t.e.r . a.t . a . t.i.m.e.)

Anyway, if you struggle with it, or pay me vast quantities of money, your program will, eventually, link. And that's all I can promise for now.


PS, I hope nobody will ever find this blog post useful in their work.

PPS, Also, if you let this put you off Haskell in any way .. well, don't. You just might want to wait a year or so before doing Haskell on Android.

Posted
please build a haskell to perl compiler

The Fay compiler is a simple way to build fairly comprehensible javascript code from Haskell source.

It occurs to me that it should be rather easy to modify Fay to emit perl code rather than javascript. This would allow contributing things like plugins to various perl programs, without writing perl.

Of course, the same idea could probably be used to compile Haskell to other languages like python, but perl seems particularly well suited as a second Fay target, since javascript and it have quite similar syntax and similar support for features like closures which Fay relies on.

I do not have time to work on this idea myself. It would be a good project for a beginning Haskell programmer. You probably don't even need to fully understand monads to do it! Essentially, look at Fay output examples, translate them from javascript to perl, and then much of the code changes in Fay would probably be in simple string generation code.


I will forward any bitcoins sent to the address 149eBtWS6i8cwQdPQJJ8hAGpDuEqNidyTj to whoever makes this. If it doesn't happen in 1 year, any donations will be forwarded to the EFF instead.

Posted
introducing propellor

Whups, I seem to have built a configuration management system this evening!

Propellor has similar goals to chef or puppet or ansible, but with an approach much more like slaughter. Except it's configured by writing Haskell code.

The name is because propellor ensures that a system is configured with the desired PROPerties, and also because it kind of pulls system configuration along after it. And you may not want to stand too close.

Disclaimer: I'm not really a sysadmin, except for on the scale of "diffuse administration of every Debian machine on planet earth or nearby", and so I don't really understand configuration management. (Well, I did write debconf, which claims to be the "Debian Configuration Management system".. But I didn't understand configuration management back then either.)

So, propellor makes some perhaps wacky choices. The least of these is that it's built from a git repository that any (theoretical) other users will fork and modify; a cron job can re-make it from time to time and pull down configuration changes, or something can be run to push changes.

A really simple configuration for a Tor bridge server using propellor looks something like this:

main = ensureProperties
    [ Apt.stdSourcesList Apt.Stable `onChange` Apt.upgrade
    , Apt.removed ["exim4"] `onChange` Apt.autoRemove
    , Hostname.set "bridget"
    , Ssh.uniqueHostKeys
    , Tor.isBridge
    ]

Since it's just haskell code, it's "easy" to refactor out common configurations for classes of servers, etc. Or perhaps integrate reclass? I don't know. I'm happy with just pure functions and type-safe refactorings of my configs, I think.

Properties are also written in Haskell of course. This one ensures that all the packages in a list are installed.

installed :: [Package] -> Property
installed ps = check (isInstallable ps) go
  where
        go = runApt $ [Param "-y", Param "install"] ++ map Param ps

Here's one that ensures the hostname is set to the desired value, which shows how to specify content for a file, and also how to run another action if a change needed to be made to satisfy a property.

set :: HostName -> Property
set hostname = "/etc/hostname" `File.hasContent` [hostname]
        `onChange` cmdProperty "hostname" [Param hostname]

Here's part of a custom one that I use to check out a user's home directory from git. Shows how to make a property require that some other property is satisfied first, and how to test if a property has already been satisfied.

installedFor :: UserName -> Property
installedFor user = check (not <$> hasGitDir user) $
        Property ("githome " ++ user) (go =<< homedir user)
                    `requires` Apt.installed ["git", "myrepos"]
  where
    go ... -- 12 lines elided

I'm about 37% happy with the overall approach to listing properties and combining properties into larger properties etc. I think that some unifying insight is missing -- perhaps there should be a Property monad? But as long as it yields a list of properties, any smarter thing should be able to be built on top of this.

Propellor is 564 lines of code, including 25 or so built-in properties like the examples above. It took around 4 hours to build.

I'm pretty sure it was easier to write it than it would have been to look into ansible and salt and slaughter (and also liw's human-readable configuration language whose name I've forgotten) in enough detail to pick one, and learn how its configuration worked, and warp it into something close to how I wanted this to work.

I think that's interesting.. It's partly about NIH and I-want-everything-in-Haskell, but it's also about a complicated system that is a lot of things to a lot of people -- of the kind I see when I look at ansible -- vs the tools and experience to build just the thing you want without the cruft. Nice to have the latter!

Posted
adding docker support to propellor

Propellor development is churning away! (And leaving no few puns in its wake..)

Now it supports secure handling of private data like passwords (only the host that owns it can see it), and fully end-to-end secured deployment via gpg signed and verified commits.

And, I've just gotten support for Docker to build. Probably not quite work, but it should only be a few bugs away at this point.

Here's how to deploy a dockerized webserver with propellor:

host hostname@"clam.kitenet.net" = Just
    [ Docker.configured
    , File.dirExists "/var/www"
    , Docker.hasContainer hostname "webserver" container
    ]

container _ "webserver" = Just $ Docker.containerFromImage "joeyh/debian-unstable"
        [ Docker.publish "80:80"
        , Docker.volume "/var/www:/var/www"
        , Docker.inside
            [ serviceRunning "apache2"
                `requires` Apt.installed ["apache2"]
            ]
        ]

Docker containers are set up using Properties too, just like regular hosts, but their Properties are run inside the container.

That means that, if I change the web server port above, Propellor will notice the container config is out of date, and stop the container, commit an image based on it, and quickly use that to bring up a new container with the new configuration.

If I change the web server to say, lighttpd, Propellor will run inside the container, and notice that it needs to install lighttpd to satisfy the new property, and so will update the container without needing to take it down.

Adding all this behavior took only 253 lines of code, and none of it impacts the core of Propellor at all; it's all in Propellor.Property.Docker. (Well, I did need another hundred lines to write a daemon that runs inside the container and reads commands to run over a named pipe... Docker makes running ad-hoc commands inside a container a PITA.)

So, I think that this vindicates the approach of making the configuration of Propellor be a list of Properties, which can be constructed by abitrarily interesting Haskell code. I didn't design Propellor to support containers, but it was easy to find a way to express them as shown above.

Compare that with how Puppet supports Docker: http://docs.docker.io/en/latest/use/puppet/

docker::run { 'helloworld':
  image        => 'ubuntu',
  command      => '/bin/sh -c "while true; do echo hello world; sleep 1; done"',
  ports        => ['4444', '4555'],
...

All puppet manages is running the image and a simple static command inside it. All the complexities that puppet provides for configuring servers cannot easily be brought to bear inside the container, and a large reason for that is, I think, that its configuration file is just not expressive enough.

Posted
propellor type-safe reversions

Propellor ensures that a list of properties about a system are satisfied. But requirements change, and so you might want to revert a property that had been set up before.

For example, I had a system with a webserver container:

Docker.docked container hostname "webserver"

I don't want a web server there any more. Rather than having a separate property to stop it, wouldn't it be nice to be able to say:

revert (Docker.docked container hostname "webserver")

I've now gotten this working. The really fun part is, some properies support reversion, but other properties certianly do not. Maybe the code to revert them is not worth writing, or maybe the property does something that cannot be reverted.

For example, Docker.garbageCollected is a property that makes sure there are no unused docker images wasting disk space. It can't be reverted. Nor can my personal standardSystem Unstable property, which amoung other things upgrades the system to unstable and sets up my home directory..

I found a way to make Propellor statically check if a property can be reverted at compile time. So revert Docker.garbageCollected will fail to type check!

The tricky part about implementing this is that the user configures Propellor with a list of properties. But now there are two distinct types of properties, revertable ones and non-revertable ones. And Haskell does not support heterogeneous lists..

My solution to this is a typeclass and some syntactic sugar operators. To build a list of properties, with individual elements that might be revertable, and others not:

 props
        & standardSystem Unstable
        & revert (Docker.docked container hostname "webserver")
        & Docker.docked container hostname "amd64-git-annex-builder"
        & Docker.garbageCollected
Posted
propellor introspection for DNS

In just released Propellor 0.3.0, I've improved improved Propellor's config file DSL significantly. Now properties can set attributes of a host, that can be looked up by its other properties, using a Reader monad.

This saves needing to repeat yourself:

hosts = [ host "orca.kitenet.net"
        & stdSourcesList Unstable
        & Hostname.sane -- uses hostname from above

And it simplifies docker setup, with no longer a need to differentiate between properties that configure docker vs properties of the container:

 -- A generic webserver in a Docker container.
    , Docker.container "webserver" "joeyh/debian-unstable"
        & Docker.publish "80:80"
        & Docker.volume "/var/www:/var/www"
        & Apt.serviceInstalledRunning "apache2"

But the really useful thing is, it allows automating DNS zone file creation, using attributes of hosts that are set and used alongside their other properties:

hosts =
    [ host "clam.kitenet.net"
        & ipv4 "10.1.1.1"

        & cname "openid.kitenet.net"
        & Docker.docked hosts "openid-provider"

        & cname "ancient.kitenet.net"
        & Docker.docked hosts "ancient-kitenet"
    , host "diatom.kitenet.net"
        & Dns.primary "kitenet.net" hosts
    ]

Notice that hosts is passed into Dns.primary, inside the definition of hosts! Tying the knot like this is a fun haskell laziness trick. :)

Now I just need to write a little function to look over the hosts and generate a zone file from their hostname, cname, and address attributes:

extractZoneFile :: Domain -> [Host] -> ZoneFile
extractZoneFile = gen . map hostAttr
  where gen = -- TODO

The eventual plan is that the cname property won't be defined as a property of the host, but of the container running inside it. Then I'll be able to cut-n-paste move docker containers between hosts, or duplicate the same container onto several hosts to deal with load, and propellor will provision them, and update the zone file appropriately.


Also, Chris Webber had suggested that Propellor be able to separate values from properties, so that eg, a web wizard could configure the values easily. I think this gets it much of the way there. All that's left to do is two easy functions:

overrideAttrsFromJSON :: Host -> JSON -> Host

exportJSONAttrs :: Host -> JSON

With these, propellor's configuration could be adjusted at run time using JSON from a file or other source. For example, here's a containerized webserver that publishes a directory from the external host, as configured by JSON that it exports:

demo :: Host
demo = Docker.container "webserver" "joeyh/debian-unstable"
    & Docker.publish "80:80"
    & dir_to_publish "/home/mywebsite" -- dummy default
    & Docker.volume (getAttr dir_to_publish ++":/var/www")
    & Apt.serviceInstalledRunning "apache2"

main = do
    json <- readJSON "my.json"
    let demo' = overrideAttrsFromJSON demo
    writeJSON "my.json" (exportJSONAttrs demo')
    defaultMain [demo']
Posted
propellor-driven DNS and backups

Took a while to get here, but Propellor 0.4.0 can deploy DNS servers and I just had it deploy mine. Including generating DNS zone files.

Configuration is dead simple, as far as DNS goes:

     & alias "ns1.example.com"
        & Dns.secondary hosts "joeyh.name"
                & Dns.primary hosts "example.com"
                        (Dns.mkSOA "ns1.example.com" 100)
                        [ (RootDomain, NS $ AbsDomain "ns1.example.com")
            , (RootDomain, NS $ AbsDomain "ns2.example.com")
                        ]

The awesome thing is that propellor fills in all the other information in the zone file by looking at the properties of the hosts it knows about.

 , host "blue.example.com"
        & ipv4 "192.168.1.1"
        & ipv6 "fe80::26fd:52ff:feea:2294"

        & alias "example.com"
        & alias "www.example.com"
        & alias "example.museum"
        & Docker.docked hosts "webserver"
            `requres` backedup "/var/www"
        
        & alias "ns2.example.com"
        & Dns.secondary hosts "example.com"

When it sees this host, Propellor adds its IP addresses to the example.com DNS zone file, for both its main hostname ("blue.example.com"), and also its relevant aliases. (The .museum alias would go into a different zone file.)

Multiple hosts can define the same alias, and then you automaticlly get round-robin DNS.

The web server part of of the blue.example.com config can be cut and pasted to another host in order to move its web server to the other host, including updating the DNS. That's really all there is to is, just cut, paste, and commit!

I'm quite happy with how that worked out. And curious if Puppet etc have anything similar.


One tricky part of this was how to ensure that the serial number automtically updates when changes are made. The way this is handled is Propellor starts with a base serial number (100 in the example above), and then it adds to it the number of commits in its git repository. The zone file is only updated when something in it besides the serial number needs to change.

The result is nice small serial numbers that don't risk overflowing the (so 90's) 32 bit limit, and will be consistent even if the configuration had Propellor setting up multiple independent master DNS servers for the same domain.


Another recent feature in Propellor is that it can use Obnam to back up a directory. With the awesome feature that if the backed up directory is empty/missing, Propellor will automcatically restore it from the backup.

Here's how the backedup property used in the example above might be implemented:

backedup :: FilePath -> Property
backedup dir = Obnam.backup dir daily
    [ "--repository=sftp://rsync.example.com/~/webserver.obnam"
    ] Obnam.OnlyClient
    `requires` Ssh.keyImported SshRsa "root"
    `requires` Ssh.knownHost hosts "rsync.example.com" "root"
    `requires` Gpg.keyImported "1B169BE1" "root"

Notice that the Ssh.knownHost makes root trust the ssh host key belonging to rsync.example.com. So Propellor needs to be told what that host key is, like so:

 , host "rsync.example.com"
        & ipv4 "192.168.1.4"
        & sshPubKey "ssh-rsa blahblahblah"

Which of course ties back into the DNS and gets this hostname set in it. But also, the ssh public key is available for this host and visible to the DNS zone file generator, and that could also be set in the DNS, in a SSHFP record. I haven't gotten around to implementing that, but hope at some point to make Propellor support DNSSEC, and then this will all combine even more nicely.


By the way, Propellor is now up to 3 thousand lines of code (not including Utility library). In 20 days, as a 10% time side project.

Posted
how I wrote init by accident

I wrote my own init. I didn't mean to, and in the end, it took 2 lines of code. Here's how.

Propellor has the nice feature of supporting provisioning of Docker containers. Since Docker normally runs just one command inside the container, I made the command that docker runs be propellor, which runs inside the container and takes care of provisioning it according to its configuration.

For example, here's a real live configuration of a container:

        -- Exhibit: kite's 90's website.
        , standardContainer "ancient-kitenet" Stable "amd64"
                & Docker.publish "1994:80"
                & Apt.serviceInstalledRunning "apache2"
                & Git.cloned "root" "git://kitenet-net.branchable.com/" "/var/www"
                        (Just "remotes/origin/old-kitenet.net")

When propellor is run inside this container, it takes care of installing apache, and since the property states apache should be running, it also starts the daemon if necessary.

At boot, docker remembers the command that was used to start the container last time, and runs it again. This time, apache is already installed, so propellor simply starts the daemon.

This was surprising, but it was just what I wanted too! The only missing bit to make this otherwise entirely free implementation of init work properly was two lines of code:

                void $ async $ job reapzombies
  where
        reapzombies = void $ getAnyProcessStatus True False

Propellor-as-init also starts up a simple equalivilant of rsh on a named pipe (for communication between the propellor inside and outside the container), and also runs a root login shell (so the user can attach to the container and administer it). Also, running a compiled program from the host system inside a container, which might use a different distribution or architecture was an interesting challenge (solved using the method described in completely linux distribution-independent packaging). So it wasn't entirely trivial, but as far as init goes, it's probably one of the simpler implementations out there.

I know that there are various other solutions on the space of an init for Docker -- personally I'd rather the host's systemd integrated with it so I could see the status of the container's daemons in systemctl status. If that does happen, perhaps I'll eventually be able to remove 2 lines of code from propellor.

Posted
propellor is d-i 2.0

I think I've been writing the second system to replace d-i with in my spare time for a couple months, and never noticed.

I'm as suprised as you are, but consider this design:

  • Installation system consists of debian live + haskell + propellor + web browser.

  • Entire installation UI consists of a web-based (and entirely pictographic and prompt based, so does not need to be translated) selection of the installation target.

  • Installation target can be local disk, remote system via ssh (wiping out crufty hacked-up pre-installed debian), local VM, live ISO, etc.

  • Really, no other questions. Not even user name/password! The installed system will only allow login via the same method that was used to install it. So a locally installed system will accept console/X login with no password and then a forced password change. Or a system installed via ssh will only allow login using the same ssh key that was used to install it.

  • The entire installation process consists of a disk format, followed by debootstrap, followed by running propellor in the target system. This also means that the installed system includes a propellor config file which now describes the properties of the system as installed (so can be edited to tweak the installation, or reused as starting point for next installation).

  • Users who want to configure installation in any way write down properties of system using a simple propellor config file. I suppose some people still use more than one partiton or gnome or some such customization, so they'd use:

main :: IO
main = Installer.main
    & Installer.partition First "/boot" Ext3 (MiB 256)
    & Installer.partition Next "/" Ext4 (GiB 5)
    & Installer.partition Next "/home" Ext4 FreeSpace
    & Installer.grubBoots "hd0"
    & os (System (Debian Stable) "amd64")
    & Apt.stdSourcesList
    & Apt.installed ["task-gnome-desktop"]
  • The installation system is itself built using propellor. A free feature given the above design, so basically all it will take to build an installation iso is this code:
main :: IO
main = Installer.main
    & Installer.target CdImage "installer.iso"
    & os (System (Debian Stable) "amd64")
    & Apt.stdSourcesList
    & Apt.installed ["task-xfce-desktop", "ghc", "propellor"]
    & User.autoLogin "root"
    & User.loginStarts "propellor --installer"
  • Propellor has a nice display of what it's doing so there is no freaking progress bar.

Well, now I know where propellor might end up if I felt like spending a month and adding a few thousand lines of code to it.

Posted
propelling containers

Propellor has supported docker containers for a "long" time, and it works great. This week I've worked on adding more container support.

docker containers (revisited)

The syntax for docker containers has changed slightly. Here's how it looks now:

example :: Host
example = host "example.com"
    & Docker.docked webserverContainer

webserverContainer :: Docker.Container
webserverContainer = Docker.container "webserver" (Docker.latestImage "joeyh/debian-stable")
    & os (System (Debian (Stable "wheezy")) "amd64")
    & Docker.publish "80:80"
    & Apt.serviceInstalledRunning "apache2"
    & alias "www.example.com"

That makes example.com have a web server in a docker container, as you'd expect, and when propellor is used to deploy the DNS server it'll automatically make www.example.com point to the host (or hosts!) where this container is docked.

I use docker a lot, but I have drank little of the Docker KoolAid. I'm not keen on using random blobs created by random third parties using either unreproducible methods, or the weirdly underpowered dockerfiles. (As for vast complicated collections of containers that each run one program and talk to one another etc ... I'll wait and see.)

That's why propellor runs inside the docker container and deploys whatever configuration I tell it to, in a way that's both replicatable later and lets me use the full power of Haskell.

Which turns out to be useful when moving on from docker containers to something else...

systemd-nspawn containers

Propellor now supports containers using systemd-nspawn. It looks a lot like the docker example.

example :: Host
example = host "example.com"
    & Systemd.persistentJournal
    & Systemd.nspawned webserverContainer

webserverContainer :: Systemd.Container
webserverContainer = Systemd.container "webserver" chroot
    & Apt.serviceInstalledRunning "apache2"
    & alias "www.example.com"
  where
    chroot = Chroot.debootstrapped (System (Debian Unstable) "amd64") Debootstrap.MinBase

Notice how I specified the Debian Unstable chroot that forms the basis of this container. Propellor sets up the container by running debootstrap, boots it up using systemd-nspawn, and then runs inside the container to provision it.

Unlike docker containers, systemd-nspawn containers use systemd as their init, and it all integrates rather beautifully. You can see the container listed in systemctl status, including the services running inside it, use journalctl to examine its logs, etc.

But no, systemd is the devil, and docker is too trendy...

chroots

Propellor now also supports deploying good old chroots. It looks a lot like the other containers. Rather than repeat myself a third time, and because we don't really run webservers inside chroots much, here's a slightly different example.

example :: Host
example = host "mylaptop"
    & Chroot.provisioned (buildDepChroot "git-annex")

buildDepChroot :: Apt.Package -> Chroot.Chroot
buildDepChroot pkg = Chroot.debootstrapped system Debootstrap.BuildD dir
    & Apt.buildDep pkg
  where
    dir = /srv/chroot/builddep/"++pkg
   system = System (Debian Unstable) "amd64"

Again this uses debootstrap to build the chroot, and then it runs propellor inside the chroot to provision it (btw without bothering to install propellor there, thanks to the magic of bind mounts and completely linux distribution-independent packaging).

In fact, the systemd-nspawn container code reuses the chroot code, and so turns out to be really rather simple. 132 lines for the chroot support, and 167 lines for the systemd support (which goes somewhat beyond the nspawn containers shown above).

Which leads to the hardest part of all this...

debootstrap

Making a propellor property for debootstrap should be easy. And it was, for Debian systems. However, I have crazy plans that involve running propellor on non-Debian systems, to debootstrap something, and installing debootstrap on an arbitrary linux system is ... too hard.

In the end, I needed 253 lines of code to do it, which is barely one magnitude less code than the size of debootstrap itself. I won't go into the ugly details, but this could be made a lot easier if debootstrap catered more to being used outside of Debian.

closing

Docker and systemd-nspawn have different strengths and weaknesses, and there are sure to be more container systems to come. I'm pleased that Propellor can add support for a new container system in a few hundred lines of code, and that it abstracts away all the unimportant differences between these systems.

PS

Seems likely that systemd-nspawn containers can be nested to any depth. So, here's a new kind of fork bomb!

infinitelyNestedContainer :: Systemd.Container
infinitelyNestedContainer = Systemd.container "evil-systemd"
    (Chroot.debootstrapped (System (Debian Unstable) "amd64") Debootstrap.MinBase)
    & Systemd.nspawned infinitelyNestedContainer

Strongly typed purely functional container deployment can only protect us against a certian subset of all badly thought out systems. ;)

Posted
clean OS reinstalls with propellor

You have a machine someplace, probably in The Cloud, and it has Linux installed, but not to your liking. You want to do a clean reinstall, maybe switching the distribution, or getting rid of the cruft. But this requires running an installer, and it's too difficult to run d-i on remote machines.

Wouldn't it be nice if you could point a program at that machine and have it do a reinstall, on the fly, while the machine was running?

This is what I've now taught propellor to do! Here's a working configuration which will make propellor convert a system running Fedora (or probably many other Linux distros) to Debian:

testvm :: Host
testvm = host "testvm.kitenet.net"
        & os (System (Debian Unstable) "amd64")
        & OS.cleanInstallOnce (OS.Confirmed "testvm.kitenet.net")
                `onChange` propertyList "fixing up after clean install"
                        [ User.shadowConfig True
                        , OS.preserveRootSshAuthorized
                        , OS.preserveResolvConf
                        , Apt.update
                        , Grub.boots "/dev/sda"
                                `requires` Grub.installed Grub.PC
                        ]
        & Hostname.sane
        & Hostname.searchDomain
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["ssh"]
        & User.hasSomePassword "root"

And here's a video of it in action.


It was surprisingly easy to build this. Propellor already knew how to create a chroot, so from there it basically just has to move files around until the chroot takes over from the old OS.

After the cleanInstallOnce property does its thing, propellor is running inside a freshly debootstrapped Debian system. Then we just need a few more Propertites to get from there to a bootable, usable system: Install grub and the kernel, turn on shadow passwords, preserve a few config files from the old OS, etc.

It's really astounding to me how much easier this was to build than it was to build d-i. It took years to get d-i to the point of being able to install a working system. It took me a few part days to add this capability to propellor (It's 200 lines of code), and I've probably spent a total of less than 30 days total developing propellor in its entirity.

So, what gives? Why is this so much easier? There are a lot of reasons:

  • Technology is so much better now. I can spin up cloud VMs for testing in seconds; I use VirtualBox to restore a system from a snapshot. So testing is much much easier. The first work on d-i was done by booting real machines, and for a while I was booting them using floppies.

  • Propellor doesn't have a user interface. The best part of d-i is preseeding, but that was mostly an accident; when I started developing d-i the first thing I wrote was main-menu (which is invisible 99.9% of the time) and we had to develop cdebconf, and tons of other UI. Probably 90% of d-i work involves the UI. Jettisoning the UI entirely thus speeds up development enormously. And propellor's configuration file blows d-i preseeding out of the water in expressiveness and flexability.

  • Propellor has a much more principled design and implementation. Separating things into Properties, which are composable and reusable gives enormous leverage. Strong type checking and a powerful programming language make it much easier to develop than d-i's mess of shell scripts calling underpowered busybox commands etc. Properties often Just Work the first time they're tested.

  • No separate runtime. d-i runs in its own environment, which is really a little custom linux distribution. Developing linux distributions is hard. Propellor drops into a live system and runs there. So I don't need to worry about booting up the system, getting it on the network, etc etc. This probably removes another order of magnitude of complexity from propellor as compared with d-i.

This seems like the opposite of the Second System effect to me. So perhaps d-i was the second system all along?

I don't know if I'm going to take this all the way to propellor is d-i 2.0. But in theory, all that's needed now is:

  • Teaching propellor how to build a bootable image, containing a live Debian system and propellor. (Yes, this would mean reimplementing debian-live, but I estimate 100 lines of code to do it in propellor; most of the Properties needed already exist.) That image would then be booted up and perform the installation.
  • Some kind of UI that generates the propellor config file.
  • Adding Properties to partition the disk.

cleanInstallOnce and associated Properties will be included in propellor's upcoming 1.1.0 release, and are available in git now.

Oh BTW, you could parameterize a few Properties by OS, and Propellor could be used to install not just Debian or Ubuntu, but whatever Linux distribution you want. Patches welcomed...

Posted
a brainfuck monad

Inspired by "An ASM Monad", I've built a Haskell monad that produces brainfuck programs. The code for this monad is available on hackage, so cabal install brainfuck-monad.

Here's a simple program written using this monad. See if you can guess what it might do:

import Control.Monad.BrainFuck

demo :: String
demo = brainfuckConstants $ \constants -> do
        add 31
        forever constants $ do
                add 1
                output

Here's the brainfuck code that demo generates: >+>++>+++>++++>+++++>++++++>+++++++>++++++++>++++++++++++++++++++++++++++++++<<<<<<<<[>>>>>>>>+.<<<<<<<<]

If you feed that into a brainfuck interpreter (I'm using hsbrainfuck for my testing), you'll find that it loops forever and prints out each character, starting with space (32), in ASCIIbetical order.

The implementation is quite similar to the ASM monad. The main differences are that it builds a String, and that the BrainFuck monad keeps track of the current position of the data pointer (as brainfuck lacks any sane way to manipulate its instruction pointer).

newtype BrainFuck a = BrainFuck (DataPointer -> ([Char], DataPointer, a))

type DataPointer = Integer

-- Gets the current address of the data pointer.
addr :: BrainFuck DataPointer
addr = BrainFuck $ \loc -> ([], loc, loc)

Having the data pointer address available allows writing some useful utility functions like this one, which uses the next (brainfuck opcode >) and prev (brainfuck opcode <) instructions.

-- Moves the data pointer to a specific address.
setAddr :: Integer -> BrainFuck ()
setAddr n = do
        a <- addr
        if a > n
                then prev >> setAddr n
                else if a < n
                        then next >> setAddr n
                        else return ()

Of course, brainfuck is a horrible language, designed to be nearly impossible to use. Here's the code to run a loop, but it's really hard to use this to build anything useful..

-- The loop is only entered if the byte at the data pointer is not zero.
-- On entry, the loop body is run, and then it loops when
-- the byte at the data pointer is not zero.
loopUnless0 :: BrainFuck () -> BrainFuck ()
loopUnless0 a = do
        open
        a
        close

To tame brainfuck a bit, I decided to treat data addresses 0-8 as constants, which will contain the numbers 0-8. Otherwise, it's very hard to ensure that the data pointer is pointing at a nonzero number when you want to start a loop. (After all, brainfuck doesn't let you set data to some fixed value like 0 or 1!)

I wrote a little brainfuckConstants that runs a BrainFuck program with these constants set up at the beginning. It just generates the brainfuck code for a series of ASCII art fishes: >+>++>+++>++++>+++++>++++++>+++++++>++++++++>

With the fishes^Wconstants in place, it's possible to write a more useful loop. Notice how the data pointer location is saved at the beginning, and restored inside the loop body. This ensures that the provided BrainFuck action doesn't stomp on our constants.

-- Run an action in a loop, until it sets its data pointer to 0.
loop :: BrainFuck () -> BrainFuck ()
loop a = do
    here <- addr
    setAddr 1
    loopUnless0 $ do
        setAddr here
        a

I haven't bothered to make sure that the constants are really constant, but that could be done. It would just need a Control.Monad.BrainFuck.Safe module, that uses a different monad, in which incr and decr and input don't do anything when the data pointer is pointing at a constant. Or, perhaps this could be statically checked at the type level, with type level naturals. It's Haskell, we can make it safer if we want to. ;)

So, not only does this BrainFuck monad allow writing brainfuck code using crazy haskell syntax, instead of crazy brainfuck syntax, but it allows doing some higher-level programming, building up a useful(!?) library of BrainFuck combinators and using them to generate brainfuck code you'd not want to try to write by hand.

Of course, the real point is that "monad" and "brainfuck" so obviously belonged together that it would have been a crime not to write this.

Posted
generating shell scripts from haskell using a shell monad

Shell script is the lingua franca of Unix, it's available everywhere and often the only reasonable choice to Get Stuff Done. But it's also clumsy and it's easy to write unsafe shell scripts, that forget to quote variables, typo names of functions, etc.

Wouldn't it be nice if we could write code in some better language, that generated nicely formed shell scripts and avoided such gotchas? Today, I've built a Haskell monad that can generate shell code.

Here's a fairly involved example. This demonstrates several features, including the variadic cmd, the ability to define shell functions, to bind and use shell variables, to build pipes (with the -:- operator), and to factor out generally useful haskell functions like pipeLess and promptFor ...

santa = script $ do
    hohoho <- func $
        cmd "echo" "Ho, ho, ho!" "Merry xmas!"
    hohoho

    promptFor "What's your name?" $ \name -> pipeLess $ do
        cmd "echo" "Let's see what's in" (val name <> quote "'s") "stocking!"
        forCmd (cmd "ls" "-1" (quote "/home/" <> val name)) $ \f -> do
            cmd "echo" "a shiny new" f
            hohoho

    cmd "rm" "/table/cookies" "/table/milk"
    hohoho

pipeLess :: Script () -> Script ()
pipeLess c = c -|- cmd "less"

promptFor :: T.Text -> (Var -> Script ()) -> Script ()
promptFor prompt cont = do
    cmd "printf" (prompt <> " ")
    var <- newVar "prompt"
    readVar var
    cont var

When run, that haskell program generates this shell code. Which, while machine-generated, has nice indentation, and is generally pretty readable.

#!/bin/sh
f1 () { :
    echo 'Ho, ho, ho!' 'Merry xmas!'
}
f1
printf 'What'"'"'s your name?  '
read '_prompt1'
(
    echo 'Let'"'"'s see what'"'"'s in' "$_prompt1"''"'"'s' 'stocking!'
    for _x1 in $(ls '-1' '/home/'"$_prompt1")
    do :
        echo 'a shiny new' "$_x1"
        f1
    done
) | (
    less
)
rm '/table/cookies' '/table/milk'
f1

Santa has already uploaded shell-monad to hackage and git.

There's a lot of things that could be added to this library (if, while, redirection, etc), but I can already see using it in various parts of propellor and git-annex that need to generate shell code.

Posted
shell monad day 3

I have been hard at work on the shell-monad ever since it was born on Christmas Eve. It's now up to 820 lines of code, and has nearly comprehensive coverage of all shell features.

Time to use it for something interesting! Let's make a shell script and a haskell program that both speak a simple protocol. This kind of thing could be used by propellor when it's deploying itself to a new host. The haskell program can ssh to a remote host and run the shell program, and talk back and forth over stdio with it, using the protocol they both speak.

abstract beginnings

First, we'll write a data type for the commands in the protocol.

data Proto
    = Foo String
    | Bar
    | Baz Integer
    deriving (Show)

Now, let's go type class crazy!

class Monad t => OutputsProto t where
    output :: Proto -> t ()

instance OutputsProto IO where
    output = putStrLn . fromProto

So far, nothing interesting; this makes the IO monad an instance of the OutputsProto type class, and gives a simple implementation to output a line of the protocol.

instance OutputsProto Script where
    output = cmd "echo" . fromProto

Now it gets interesting. The Script monad is now also a member of the OutputsProto. To output a line of the protocol, it just uses echo. Yeah -- shell code is a member of a haskell type class. Awesome -- most abstract shell code evar!

Similarly, we can add another type class for monads that can input the protocol:

class Monad t => InputsProto t p where
    input :: t p

instance InputsProto IO Proto where
    input = toProto <$> readLn

instance InputsProto Script Var where
    input = do
        v <- newVar ()
        readVar v
        return v

While the IO version reads and deserializes a line back to a Proto, the shell script version of this returns a Var, which has the newly read line in it, not yet deserialized. Why the difference? Well, Haskell has data types, and shell does not ...

speaking the protocol

Now we have enough groundwork to write haskell code in the IO monad that speaks the protocol in arbitrary ways. For example:

protoExchangeIO :: Proto -> IO Proto
protoExchangeIO p = do
    output p
    input

fooIO :: IO ()
fooIO = do
    resp <- protoExchangeIO (Foo "starting up")
    -- etc

But that's trivial and uninteresting. Anyone who has read to here certianly knows how to write haskell code in the IO monad. The interesting part is making the shell program speak the protocol, including doing various things when it receives the commands.

foo :: Script ()
foo = do
    stopOnFailure True
    handler <- func (NamedLike "handler") $
        handleProto =<< input
    output (Foo "starting up")
    handler
    output Bar
    handler

handleFoo :: Var -> Script ()
handleFoo v = toStderr $ cmd "echo" "yay, I got a Foo" v

handleBar :: Script ()
handleBar = toStderr $ cmd "echo" "yay, I got a Bar"

handleBaz :: Var -> Script ()
handleBaz num = forCmd (cmd "seq" (Val (1 :: Int)) num) $
    toStderr . cmd "echo" "yay, I got a Baz"

serialization

I've left out a few serialization functions. fromProto is used in both instances of OutputsProto. The haskell program and the script will both use this to serialize Proto.

fromProto :: Proto -> String
fromProto (Foo s) = pFOO ++ " " ++ s
fromProto Bar = pBAR ++ " "
fromProto (Baz i) = pBAZ ++ " " ++ show i

pFOO, pBAR, pBAZ :: String
(pFOO, pBAR, pBAZ) = ("FOO", "BAR", "BAZ")

And here's the haskell function to convert the other direction, which was also used earlier.

toProto :: String -> Proto
toProto s = case break (== ' ') s of
    (w, ' ':rest)
        | w == pFOO -> Foo rest
        | w == pBAR && null rest -> Bar
        | w == pBAZ -> Baz (read rest)
        | otherwise -> error $ "unknown protocol command: " ++ w
    (_, _) -> error "protocol splitting error"

We also need a version of that written in the Script monad. Here it is. Compare and contrast the function below with the one above. They're really quite similar. (Sadly, not so similar to allow refactoring out a common function..)

handleProto :: Var -> Script ()
handleProto v = do
    w <- getProtoCommand v
    let rest = getProtoRest v
    caseOf w
        [ (quote (T.pack pFOO), handleFoo =<< rest)
        , (quote (T.pack pBAR), handleBar)
        , (quote (T.pack pBAZ), handleBaz =<< rest)
        , (glob "*", do
            toStderr $ cmd "echo" "unknown protocol command" w
            cmd "false"
          )
        ]

Both toProto and handleProto split the incoming line apart into the first word, and the rest of the line, then match the first word against the commands in the protocol, and dispatches to appropriate actions. So, how do we split a variable apart like that in the Shell monad? Like this...

getProtoCommand :: Var -> Script Var
getProtoCommand v = trimVar LongestMatch FromEnd v (glob " *")

getProtoRest :: Var -> Script Var
getProtoRest v = trimVar ShortestMatch FromBeginning v (glob "[! ]*[ ]")

(This could probably be improved by using a DSL to generate the globs too..)

conclusion

And finally, here's a main to generate the shell script!

main :: IO ()
main = T.writeFile "protocol.sh" $ script foo

The pretty-printed shell script that produces is not very interesting, but I'll include it at the end for completeness. More interestingly for the purposes of sshing to a host and running the command there, we can use linearScript to generate a version of the script that's all contained on a single line. Also included below.

I could easily have written the pretty-printed version of the shell script in twice the time that it took to write the haskell program that generates it and also speaks the protocol itself.

I would certianly have had to test the hand-written shell script repeatedly. Code like for _x in $(seq 1 "${_v#[!\ ]*[\ ]}") doesn't just write and debug itself. (Until now!)

But, the generated scrpt worked 100% on the first try! Well, it worked as soon as I got the Haskell program to compile...

But the best part is that the Haskell program and the shell script don't just speak the same protocol. They both rely on the same definition of Proto. So this is fairly close to the kind of type-safe protocol serialization that Fay provides, when compiling Haskell to javascript.

I'm getting the feeling that I won't be writing too many nontrivial shell scripts by hand anymore! :)

the complete haskell program

Is here, all 99 lines of it.

the pretty-printed shell program

#!/bin/sh
set -x
_handler () { :
    _v=
    read _v
    case "${_v%%\ *}" in FOO) :
        echo 'yay, I got a Foo' "${_v#[!\ ]*[\ ]}" >&2
    : ;; BAR) :
        echo 'yay, I got a Bar' >&2
    : ;; BAZ) :
        for _x in $(seq 1 "${_v#[!\ ]*[\ ]}")
        do :
            echo 'yay, I got a Baz' "$_x" >&2
        done
    : ;; *) :
        echo 'unknown protocol command' "${_v%%\ *}" >&2
        false
    : ;; esac
}
echo 'FOO starting up'
_handler
echo 'BAR '
_handler

the one-liner shell program

set -p; _handler () { :;    _v=;    read _v;    case "${_v%%\ *}" in FOO) :;        echo 'yay, I got a Foo' "${_v#[!\ ]*[\ ]}" >&2;     : ;; BAR) :;        echo 'yay, I got a Bar' >&2;    : ;; BAZ) :;        for _x in $(seq 1 "${_v#[!\ ]*[\ ]}");      do :;           echo 'yay, I got a Baz' "$_x" >&2;      done;   : ;; *) :;      echo 'unknown protocol command' "${_v%%\ *}" >&2;       false;  : ;; esac; }; echo 'FOO starting up'; _handler; echo 'BAR '; _handler
making propellor safer with GADTs and type families

Since July, I have been aware of an ugly problem with propellor. Certain propellor configurations could have a bug. I've tried to solve the problem at least a half-dozen times without success; it's eaten several weekends.

Today I finally managed to fix propellor so it's impossible to write code that has the bug, bending the Haskell type checker to my will with the power of GADTs and type-level functions.

the bug

Code with the bug looked innocuous enough. Something like this:

foo :: Property
foo = property "foo" $
    unlessM (liftIO $ doesFileExist "/etc/foo") $ do
        bar <- liftIO $ readFile "/etc/foo.template"
        ensureProperty $ setupFoo bar

The problem comes about because some properties in propellor have Info associated with them. This is used by propellor to introspect over the properties of a host, and do things like set up DNS, or decrypt private data used by the property.

At the same time, it's useful to let a Property internally decide to run some other Property. In the example above, that's the ensureProperty line, and the setupFoo Property is run only sometimes, and is passed data that is read from the filesystem.

This makes it very hard, indeed probably impossible for Propellor to look inside the monad, realize that setupFoo is being used, and add its Info to the host.

Probably, setupFoo doesn't have Info associated with it -- most properties do not. But, it's hard to tell, when writing such a Property if it's safe to use ensureProperty. And worse, setupFoo could later be changed to have Info.

Now, in most languages, once this problem was noticed, the solution would probably be to make ensureProperty notice when it's called on a Property that has Info, and print a warning message. That's Good Enough in a sense.

But it also really stinks as a solution. It means that building propellor isn't good enough to know you have a working system; you have to let it run on each host, and watch out for warnings. Ugh, no!

the solution

This screams for GADTs. (Well, it did once I learned how what GADTs are and what they can do.)

With GADTs, Property NoInfo and Property HasInfo can be separate data types. Most functions will work on either type (Property i) but ensureProperty can be limited to only accept a Property NoInfo.

data Property i where
    IProperty :: Desc -> ... -> Info -> Property HasInfo
    SProperty :: Desc -> ... -> Property NoInfo

data HasInfo
data NoInfo

ensureProperty :: Property NoInfo -> Propellor Result

Then the type checker can detect the bug, and refuse to compile it.

Yay!

Except ...

Property combinators

There are a lot of Property combinators in propellor. These combine two or more properties in various ways. The most basic one is requires, which only runs the first Property after the second one has successfully been met.

So, what's it's type when used with GADT Property?

requires :: Property i1 -> Property i2 -> Property ???

It seemed I needed some kind of type class, to vary the return type.

class Combine x y r where
    requires :: x -> y -> r

Now I was able to write 4 instances of Combines, for each combination of 2 Properties with HasInfo or NoInfo.

It type checked. But, type inference was busted. A simple expression like

foo `requires` bar

blew up:

   No instance for (Requires (Property HasInfo) (Property HasInfo) r0)
      arising from a use of `requires'
    The type variable `r0' is ambiguous
    Possible fix: add a type signature that fixes these type variable(s)
    Note: there is a potential instance available:
      instance Requires
                 (Property HasInfo) (Property HasInfo) (Property HasInfo)
        -- Defined at Propellor/Types.hs:167:10

To avoid that, it needed ":: Property HasInfo" appended -- I didn't want the user to need to write that.

I got stuck here for an long time, well over a month.

type level programming

Finally today I realized that I could fix this with a little type-level programming.

class Combine x y where
    requires :: x -> y -> CombinedType x y

Here CombinedType is a type-level function, that calculates the type that should be used for a combination of types x and y. This turns out to be really easy to do, once you get your head around type level functions.

type family CInfo x y
type instance CInfo HasInfo HasInfo = HasInfo
type instance CInfo HasInfo NoInfo = HasInfo
type instance CInfo NoInfo HasInfo = HasInfo
type instance CInfo NoInfo NoInfo = NoInfo
type family CombinedType x y
type instance CombinedType (Property x) (Property y) = Property (CInfo x y)

And, with that change, type inference worked again! \o/

(Bonus: I added some more intances of CombinedType for combining things like RevertableProperties, so propellor's property combinators got more powerful too.)

Then I just had to make a massive pass over all of Propellor, fixing the types of each Property to be Property NoInfo or Property HasInfo. I frequently picked the wrong one, but the type checker was able to detect and tell me when I did.

A few of the type signatures got slightly complicated, to provide the type checker with sufficient proof to do its thing...

before :: (IsProp x, Combines y x, IsProp (CombinedType y x)) => x -> y -> CombinedType y x
before x y = (y `requires` x) `describe` (propertyDesc x)

onChange
    :: (Combines (Property x) (Property y))
    => Property x
    => Property y
    => CombinedType (Property x) (Property y)
onChange = -- 6 lines of code omitted

fallback :: (Combines (Property p1) (Property p2)) => Property p1 -> Property p2 -> Property (CInfo p1 p2)
fallback = -- 4 lines of code omitted

.. This mostly happened in property combinators, which is an acceptable tradeoff, when you consider that the type checker is now being used to prove that propellor can't have this bug.

Mostly, things went just fine. The only other annoying thing was that some things use a [Property], and since a haskell list can only contain a single type, while Property Info and Property NoInfo are two different types, that needed to be dealt with. Happily, I was able to extend propellor's existing (&) and (!) operators to work in this situation, so a list can be constructed of properties of several different types:

propertyList "foos" $ props
    & foo
    & foobar
    ! oldfoo    

conclusion

The resulting 4000 lines of changes will be in the next release of propellor. Just as soon as I test that it always generates the same Info as before, and perhaps works when I run it. (eep)

These uses of GADTs and type families are not new; this is merely the first time I used them. It's another Haskell leveling up for me.

Anytime you can identify a class of bugs that can impact a complicated code base, and rework the code base to completely avoid that class of bugs, is a time to celebrate!

Posted
7drl 2015 day 1 groundwork

Scroll is a roguelike, with a twist, which I won't reveal until I've finished building it. I'll just say: A playable roguelike pun, set in a filesystem near you.

I'm creating Scroll as part of the 7DRL Challange. If all goes well, I'll have a usable roguelike game finished in 7 days.

This is my first time developing a roguelike, and my first time writing a game in Haskell, and my first time writing a game to a time limit. Wow!


First, some groundwork. I'm writing Scroll in Haskell, so let's get the core data types and monads and IO squared away. Then I can spend days 2-7 writing entirely pure functional code, in the Haskell happy place.

To represent the current level, I'm using a Vector of Vectors of Chars. Actually, MVectors, which can be mutated safely by pure code running inside the ST monad, so it's fast and easy to read or write any particular location on the level.

-- Writes a Char to a position in the world.
writeWorld :: Pos -> Char -> M ()
writeWorld (x, y) c = modWorld $ \yv -> do
    xv <- V.read yv y
    V.write xv x c

showPlayer :: M ()
showPlayer = writeWorld (5,8) '@'

(I wish these Vectors had their size as part of their types. There are vector libraries on hackage that do, but not the standard vector library, which has mutable vectors. As it is, if I try to access outside the bounds of the world, it'll crash at runtime.)

Since the game will need some other state, I'm using the state monad. The overall monad stack is type M = StateT S (ST RealWorld). (It could be forall s. StateT S (ST s), but I had some trouble getting that to type check, so I fixed s to RealWorld, which is ok since it'll be run using stToIO.

Next, a concept of time, and the main event loop. I decided to use a continutation passing style, so the main loop takes the current continuation, and runs it to get a snapshot of the state to display, and a new continutation. The advantage of using continuations this way is that all the game logic can be handled in the pure code.

I should probably be using the Cont monad in my monad stack, but I've not learned it and lack time. For now I'm handling the continuations by hand, which seems ok.

updateWorld :: Step
updateWorld (Just 'Q') = do
        addMessage "Are you sure you want to quit? [yn]"
        next $ \i -> case i of
                Just 'y' -> quit
                _ -> continue
updateWorld input = do
        addMessage ("pressed " ++ show input)
        continue

Finally, I wrote some ncurses display code, which is almost working.


Start time: After midnight last night. Will end by midnight next Friday.

Lines of code written today: 368

Craziest type signature today: writeS :: forall a. ((Vec2 a -> ST RealWorld ()) -> M ()) -> Pos -> a -> M ()


By the way, there's a whole LambdaHack library for Haskell, targeted at just this kind of roguelike construction. It looks excellent. I'm not using it for two reasons:

  1. Scroll is going to be unusual in a lot of ways, and LambdaHack probably makes some assumptions that don't fit.
  2. mainSer :: (MonadAtomic m, MonadServerReadRequest m) => [String] -> COps -> (m () -> IO ()) -> (COps -> DebugModeCli -> ((FactionId -> ChanServer ResponseUI RequestUI -> IO ()) -> (FactionId -> ChanServer ResponseAI RequestAI -> IO ()) -> IO ()) -> IO ()) -> IO ()
    That's a lot of stuff to figure out! I only have a week, so it's probably easier to build my own framework, and this gives me an opportunity to learn more generally useful stuff, like how to use mutable Vectors.
Posted
7drl 2015 day 3 movement at last

Got the player moving in the map! And, got the map to be deadly in its own special way.

        HeadCrush -> do
                showMessage "You die."
                endThread

Even winning the game is implemented. The game has a beginning, a middle, and an end.

I left the player movement mostly unconstrained, today, while I was working on things to do with the end of the game, since that makes it easier to play through and test them. Tomorrow, I will turn on fully constrained movement (an easy change), implement inventory (which is very connected to movement constraints in Scroll), and hope to start on the spell system too.


At this point, Scroll is 622 lines of code, including content. Of which, I notice, fully 119 are types and type classes.

Only 4 days left! Eep! I'm very glad that scroll's central antagonist is already written. I don't plan to add other creatures, which will save some time.


Last night as I was drifting off to sleep, it came to me a way to implement my own threading system for my roguelike. Since time in a roguelike happens in discrete ticks, as the player takes each action, normal OS threads are not suitable. And in my case, I'm doing everything in pure code anyway and certianly cannot fork off a thread for some background job.

But, since I'm using continuation passing style, I can just write my own fork, that takes two continuations and combines them, causing both to be run on each tick, and recursing to handle combining the resulting continuations.

It was really quite simple to implement. Typechecked on the first try even!

fork :: M NextStep -> M NextStep -> M NextStep
fork job rest = do
        jn <- job
        rn <- rest
        runthread jn rn
  where
        runthread (NextStep _ (Just contjob)) (NextStep v (Just contr)) =
                return $ NextStep v $ Just $ \i -> do
                        jn <- contjob i
                        rn <- contr i
                        runthread jn rn
        runthread (NextStep _ Nothing) (NextStep v (Just contr)) =
                return $ NextStep v (Just contr)
        runthread _ (NextStep v Nothing) =
                return $ NextStep v Nothing

endThread :: M NextStep
endThread = nextStep Nothing

background :: M NextStep -> M NextStep
background job = fork job continue

demo :: M NextStep
demo = do
    showMessage "foo"
    background $ next $ const $
        clearMessage >> endThread

That has some warts, but it's good enough for my purposes, and pretty awesome for a threading system in 66 LOC.

Posted
7drl 2015 day 5 type directed spell system development

I want my 7drl game Scroll to have lots of interesting spells. So, as I'm designing its spell system, I've been looking at the types, and considering the whole universe of possible spells that fit within the constraints of the types.

My first throught was that a spell would be a function from World -> World. That allows any kind of spell that manipulates the game map. Like, for instance a "whiteout" that projects a stream of whitespace from the player's mouth.

Since Scroll has a state monad, I quickly generalized that; making spell actions a state monad M (), which lets spells reuse other monadic actions, and affect the whole game state, including the player. Now I could write a spell like "teleport", or "grow".

But it quickly became apparent this was too limiting: While spells could change the World map, the player, and even change the list of supported spells, they had no way to prompting for input.

I tried a few types of the Event -> M () variety, but they were all too limiting. Finally, I settled on this type for spell actions: M NextStep -> M NextStep.

And then I spent 3 hours exploring the universe of spells that type allows! To understand them, it helps to see what a NextStep is:

type Step = Event -> M NextStep
data NextStep = NextStep View (Maybe Step)

Since NextStep is a continuation, spells take the original continuation, and can not only modify the game state, but can return an altered continuation. Such as one that prompts for input before performing the spell, and then calls the original continuation to get on with the game.

That let me write "new", a most interesting spell, that lets the player add a new way to cast an existing spell. Spells are cast using ingredients, and so this prompts for a new ingredient to cast a spell. (I hope that "new farming" will be one mode of play to possibly win Scroll.)

And, it lets me write spells that fail in game-ending ways. (Ie, "genocide @"). A spell can cause the game to end by returning a continuation that has Nothing as its next step.

Even better, I could write spells that return a continuation that contains a forked background task, using the 66 line contiuation based threading system I built in day 3. This allows writing lots of fun spells that have an effect that lasts for a while. Things like letting the player quickly digest letters they eat, or slow down the speed of events.

And then I thought of "dream". This spell stores the input continuation and game state, and returns a modified continuation that lets the game continue until it ends, and then restores from the point it saved. So, the player dreams they're playing, and wakes back up where they cast the spell. A wild spell, which can have different variants, like precognitive dreams where the same random numbers are used as will be used upon awaking, or dreams where knowledge carries back over to the real world in different ways. (Supports Inception too..)

Look how easy it was to implement dreaming, in this game that didn't have any notion of "save" or "restore"!

runDream :: M NextStep -> M NextStep -> (S -> S) -> M NextStep
runDream sleepcont wakecont wakeupstate = go =<< sleepcont
   where
         go (NextStep v ms) = return $ NextStep v $ Just $
        maybe wake (go <=<) ms
         wake _evt = do
                 modify wakeupstate
                 wakecont

I imagine that, if I were not using Haskell, I'd have just made the spell be an action, that can do IO in arbitrary ways. Such a spell system can of course do everything I described above and more. But, I think that using a general IO action is so broad that it hides the interesting possibilities like "dream".

By starting with a limited type for spells, and exploring toward more featureful types, I was able to think about the range of possibilities of spells that each type allowed, be inspired with interesting ideas, and implement them quickly.

Just what I need when writing a roguelike in just 7 days!


Fun comment thread on reddit

Posted
it's a bird, it's a plane, it's a super monoid for propellor

I've been doing a little bit of dynamically typed programming in Haskell, to improve Propellor's Info type. The result is kind of interesting in a scary way.

Info started out as a big record type, containing all the different sorts of metadata that Propellor needed to keep track of. Host IP addresses, DNS entries, ssh public keys, docker image configuration parameters... This got quite out of hand. Info needed to have its hands in everything, even types that should have been private to their module.

To fix that, recent versions of Propellor let a single Info contain many different types of values. Look at it one way and it contains DNS entries; look at it another way and it contains ssh public keys, etc.

As an émigré from lands where you can never know what type of value is in a $foo until you look, this was a scary prospect at first, but I found it's possible to have the benefits of dynamic types and the safety of static types too.

The key to doing it is Data.Dynamic. Thanks to Joachim Breitner for suggesting I could use it here. What I arrived at is this type (slightly simplified):

newtype Info = Info [Dynamic]
    deriving (Monoid)

So Info is a monoid, and it holds of a bunch of dynamic values, which could each be of any type at all. Eep!

So far, this is utterly scary to me. To tame it, the Info constructor is not exported, and so the only way to create an Info is to start with mempty and use this function:

addInfo :: (IsInfo v, Monoid v) => Info -> v -> Info
addInfo (Info l) v = Info (toDyn v : l)

The important part of that is that only allows adding values that are in the IsInfo type class. That prevents the foot shooting associated with dynamic types, by only allowing use of types that make sense as Info. Otherwise arbitrary Strings etc could be passed to addInfo by accident, and all get concated together, and that would be a total dynamic programming mess.

Anything you can add into an Info, you can get back out:

getInfo :: (IsInfo v, Monoid v) => Info -> v
getInfo (Info l) = mconcat (mapMaybe fromDynamic (reverse l))

Only monoids can be stored in Info, so if you ask for a type that an Info doesn't contain, you'll get back mempty.

Crucially, IsInfo is an open type class. Any module in Propellor can make a new data type and make it an instance of IsInfo, and then that new data type can be stored in the Info of a Property, and any Host that uses the Property will have that added to its Info, available for later introspection.


For example, this weekend I'm extending Propellor to have controllers: Hosts that are responsible for running Propellor on some other hosts. Useful if you want to run propellor once and have it update the configuration of an entire network of hosts.

There can be whole chains of controllers controlling other controllers etc. The problem is, what if host foo has the property controllerFor bar and host bar has the property controllerFor foo? I want to avoid a loop of foo running Propellor on bar, running Propellor on foo, ...

To detect such loops, each Host's Info should contain a list of the Hosts it's controlling. Which is not hard to accomplish:

newtype Controlling = Controlled [Host]
    deriving (Typeable, Monoid)

isControlledBy :: Host -> Controlling -> Bool
h `isControlledBy` (Controlled hs) = any (== hostName h) (map hostName hs)

instance IsInfo Controlling where
    propigateInfo _ = True

mkControllingInfo :: Host -> Info
mkControllingInfo controlled = addInfo mempty (Controlled [controlled])

getControlledBy :: Host -> Controlling
getControlledBy = getInfo . hostInfo

isControllerLoop :: Host -> Host -> Bool
isControllerLoop controller controlled = go S.empty controlled
  where
    go checked h
        | controller `isControlledBy` c = True
        -- avoid checking loops that have been checked before
        | hostName h `S.member` checked = False
        | otherwise = any (go (S.insert (hostName h) checked)) l
      where
        c@(Controlled l) = getControlledBy h

This is all internal to the module that needs it; the rest of propellor doesn't need to know that the Info is using used for this. And yet, the necessary information about Hosts is gathered as propellor runs.


So, that's a useful technique. I do wonder if I could somehow make addInfo combine together values in the list that have the same type; as it is the list can get long. And, to show Info, the best I could do was this:

 instance Show Info where
            show (Info l) = "Info " ++ show (map dynTypeRep l)

The resulting long list of the types of vales stored in a host's info is not a useful as it could be. Of course, getInfo can be used to get any particular type of value:

*Main> hostInfo kite
Info [InfoVal System,PrivInfo,PrivInfo,Controlling,DnsInfo,DnsInfo,DnsInfo,AliasesInfo, ...
*Main> getInfo (hostInfo kite) :: AliasesInfo
AliasesInfo (fromList ["downloads.kitenet.net","git.joeyh.name","imap.kitenet.net","nntp.olduse.net" ...

And finally, I keep trying to think of a better name than "Info".

Posted
concurrent output library

concurrent-output is a Haskell library I've developed this week, to make it easier to write console programs that do a lot of different things concurrently, and want to serialize concurrent outputs sanely.

It's increasingly easy to write concurrent programs, but all their status reporting has to feed back through the good old console, which is still obstinately serial.

Haskell illustrates problem this well with this "Linus's first kernel" equivilant interleaving the output of 2 threads:

> import System.IO
> import Control.Concurrent.Async
> putStrLn (repeat 'A') `concurrently` putStrLn (repeat 'B')
BABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABA
BABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABA
...

That's fun, but also horrible if you wanted to display some messages to the user:

> putStrLn "washed the car" `concurrently` putStrLn "walked the dog"
walwkaesdh etdh et hdeo gc
ar

To add to the problem, we often want to run separate programs concurrently, which have output of their own to display. And, just to keep things interesting, sometimes a unix program will behave differently when stdout is not connected to a terminal (eg, ls | cat).

To tame simple concurrent programs like these so they generate readable output involves a lot of plumbing. Something like, run the actions concurrently, taking care to capture the output of any commands, and then feed the output that the user should see though some sort of serializing channel to the display. Dealing with that when you just wanted a simple concurrent program risks ending up with a not-so-simple program.

So, I wanted an library with basically 2 functions:

outputConcurrent :: String -> IO ()
    
createProcessConcurrent :: CreateProcess -> IO whatever

The idea is, you make your program use outputConcurrent to display all its output, and each String you pass to that will be displayed serially, without getting mixed up with any other concurrent output.

And, you make your program use createProcessConcurrent everywhere it starts a process that might output to stdout or stderr, and it'll likewise make sure its output is displayed serially.

Oh, and createProcessConcurrent should avoid redirecting stdout and stderr away from the console, when no other concurrent output is happening. So, if programs are mostly run sequentially, they behave as they normally would at the console; any behavior changes should only occur when there is concurrency. (It might also be nice for it to allocate ttys and run programs there to avoid any behavior changes at all, although I have not tried to do that.)

And that should be pretty much the whole API, although it's ok if it needs some function called by main to set it up:

import Control.Concurrent.Async
import System.Console.Concurrent
import System.Process

main = withConcurrentOutput $
    outputConcurrent "washed the car\n"
        `concurrently`
    createProcessConcurrent (proc "ls" [])
        `concurrently`
    outputConcurrent "walked the dog\n"
$ ./demo
washed the car
walked the dog
Maildir/  bin/  doc/  html/  lib/  mail/  mnt/  src/  tmp/

I think that's a pretty good API to deal with this concurrent output problem. Anyone know of any other attempts at this I could learn from?

I implemented this over the past 3 days and 320 lines of code. It got rather hairy:

  • It has to do buffering of the output.
  • There can be any quantity of output, but program memory use should be reasonably small. Solved by buffering up to 1 mb of output in RAM, and writing excess buffer to temp files.
  • Falling off the end of the program is complicated; there can be buffered output to flush and it may have to wait for some processes to finish running etc.
  • The locking was tough to get right! I could not have managed to write it correctly without STM.

It seems to work pretty great though. I got Propellor using it, and Propellor can now run actions concurrently!

a tiling region manager for the console

Building on top of concurrent-output, and some related work Joachim Breitner did earlier, I now have a kind of equivilant to a tiling window manager, except it's managing regions of the console for different parts of a single program.

Here's a really silly demo, in an animated gif:

demo2.gif

Not bad for 23 lines of code, is that? Seems much less tedious to do things this way than using ncurses. Even with its panels, ncurses requires you to think about layout of various things on the screen, and many low-level details. This, by contrast, is compositional, just add another region and a thread to update it, and away it goes.

So, here's an apt-like download progress display, in 30 lines of code.

aptdemo.gif

Not only does it have regions which are individual lines of the screen, but those can have sub-regions within them as seen here (and so on).

And, log-type messages automatically scroll up above the regions. External programs run by createProcessConcurrent will automatically get their output/errors displayed there, too.

What I'm working on now is support for multiline regions, which automatically grow/shrink to fit what's placed in them. The hard part, which I'm putting the finishing touches on, is to accurately work out how large a region is before displaying it, in order to lay it out. Requires parsing ANSI codes amoung other things.

STM rules

There's so much concurrency, with complicated interrelated data being updated by different threads, that I couldn't have possibly built this without Software Transactional Memory.

Rather than a nightmare of locks behind locks behind locks, the result is so well behaved that I'm confident that anyone who needs more control over the region layout, or wants to do funky things can dive into to the STM interface and update the data structures, and nothing will ever deadlock or be inconsistent, and as soon as an update completes, it'll display on-screen.

An example of how powerful and beuatiful STM is, here's how the main display thread determines when it needs to refresh the display:

data DisplayChange
        = BufferChange [(StdHandle, OutputBuffer)]
        | RegionChange RegionSnapshot
        | TerminalResize (Maybe Width)
        | EndSignal ()

    ...
                change <- atomically $
                        (RegionChange <$> regionWaiter origsnapshot)
                                `orElse`
                        (RegionChange <$> regionListWaiter origsnapshot)
                                `orElse`
                        (BufferChange <$> outputBufferWaiterSTM waitCompleteLines)
                                `orElse`
                        (TerminalResize <$> waitwidthchange)
                                `orElse`
                        (EndSignal <$> waitTSem endsignal)
                case change of
                        RegionChange snapshot -> do
                ...
                        BufferChange buffers -> do
                ...
                        TerminalResize width -> do
                ...

So, it composes all these STM actions that can wait on various kinds of changes, to get one big action, that waits for all of the above, and builds up a nice sum type to represent what's changed.

Another example is that the whole support for sub-regions only involved adding 30 lines of code, all of it using STM, and it worked 100% the first time.


Available in concurrent-output 1.1.0.

STM Region contents

concurrent-output released yesterday got a lot of fun features. It now does full curses-style minimization of the output, to redraw updated lines with optimal efficiency. And supports multiline regions/wrapping too long lines. And allows the user to embed ANSI colors in a region. 3 features that are in some tension and were fun to implement all together.

But I have a more interesting feature to blog about... I've added the ability for the content of a Region to be determined by a (STM transaction).

Here, for example, is a region that's a clock:

timeDisplay :: TVar UTCTime -> STM Text
timeDisplay tv = T.pack . show <$> readTVar tv

clockRegion :: IO ConsoleRegionHandle
clockRegion = do
    tv <- atomically . newTVar =<< getCurrentTime
    r <- openConsoleRegion Linear
    setConsoleRegion r (timeDisplay tv)
    async $ forever $ do
        threadDelay 1000000 -- 1 sec
        atomically . (writeTVar tv) =<< getCurrentTime
    return r

There's something magical about this. Whenever a new value is written into the TVar, concurrent-output automatically knows that this region needs to be updated. How does it know how to do that?

Magic of STM. Basically, concurrent-output composes all the STM transactions of Regions, and asks STM to wait until there's something new to display. STM keeps track of whatever TVars might be looked at, and so can put the display thread to sleep until there's a change to display.

Using STM I've gotten extensability for free, due to the nice ways that STM transactions compose.

A few other obvious things to do with this: Compose 2 regions with padding so they display on the same line, left and right aligned. Trim a region's content to the display width. (Handily exported by concurrent-output in a TVar for this kind of thing.)


I'm tempted to write a console spreadsheet using this. Each visible cell of the spreadsheet would have its own region, that uses a STM transaction to display. Plain data Cells would just display their current value. Cells that contain a function would read the current values of other Cells, and use that to calculate what to display. Which means that a Cell containing a function would automatically update whenever any of the Cells that it depends on were updated!

Do you think that a simple interactive spreadsheet built this way would be more than 100 lines of code?

Posted
type safe multi-OS Propellor

Propellor was recently ported to FreeBSD, by Evan Cofsky. This new feature led me down a two week long rabbit hole to make it type safe. In particular, Propellor needed to be taught that some properties work on Debian, others on FreeBSD, and others on both.

The user shouldn't need to worry about making a mistake like this; the type checker should tell them they're asking for something that can't fly.

-- Is this a Debian or a FreeBSD host? I can't remember, let's use both package managers!
host "example.com" $ props
    & aptUpgraded
    & pkgUpgraded

As of propellor 3.0.0 (in git now; to be released soon), the type checker will catch such mistakes.

Also, it's really easy to combine two OS-specific properties into a property that supports both OS's:

upgraded = aptUpgraded `pickOS` pkgUpgraded

type level lists and functions

The magick making this work is type-level lists. A property has a metatypes list as part of its type. (So called because it's additional types describing the type, and I couldn't find a better name.) This list can contain one or more OS's targeted by the property:

aptUpgraded :: Property (MetaTypes '[ 'Targeting 'OSDebian, 'Targeting 'OSBuntish ])

pkgUpgraded :: Property (MetaTypes '[ 'Targeting 'OSFreeBSD ])

In Haskell type-level lists and other DataKinds are indicated by the ' if you have not seen that before. There are some convenience aliases and type operators, which let the same types be expressed more cleanly:

aptUpgraded :: Property (Debian + Buntish)

pkgUpgraded :: Property FreeBSD

Whenever two properties are combined, their metatypes are combined using a type-level function. Combining aptUpgraded and pkgUpgraded will yield a metatypes that targets no OS's, since they have none in common. So will fail to type check.

My implementation of the metatypes lists is hundreds of lines of code, consisting entirely of types and type families. It includes a basic implementation of singletons, and is portable back to ghc 7.6 to support Debian stable. While it takes some contortions to support such an old version of ghc, it's pretty awesome that the ghc in Debian stable supports this stuff.

extending beyond targeted OS's

Before this change, Propellor's Property type had already been slightly refined, tagging them with HasInfo or NoInfo, as described in making propellor safer with GADTs and type families. I needed to keep that HasInfo in the type of properties.

But, it seemed unnecessary verbose to have types like Property NoInfo Debian. Especially if I want to add even more information to Property types later. Property NoInfo Debian NoPortsOpen would be a real mouthful to need to write for every property.

Luckily I now have this handy type-level list. So, I can shove more types into it, so Property (HasInfo + Debian) is used where necessary, and Property Debian can be used everywhere else.

Since I can add more types to the type-level list, without affecting other properties, I expect to be able to implement type-level port conflict detection next. Should be fairly easy to do without changing the API except for properties that use ports.

singletons

As shown here, pickOS makes a property that decides which of two properties to use based on the host's OS.

aptUpgraded :: Property DebianLike
aptUpgraded = property "apt upgraded" (apt "upgrade" `requires` apt "update")

pkgUpgraded :: Property FreeBSD
pkgUpgraded = property "pkg upgraded" (pkg "upgrade")
    
upgraded :: Property UnixLike
upgraded = (aptUpgraded `pickOS` pkgUpgraded)
    `describe` "OS upgraded"

Any number of OS's can be chained this way, to build a property that is super-portable out of simple little non-portable properties. This is a sweet combinator!

Singletons are types that are inhabited by a single value. This lets the value be inferred from the type, which came in handy in building the pickOS property combinator.

Its implementation needs to be able to look at each of the properties at runtime, to compare the OS's they target with the actial OS of the host. That's done by stashing a target list value inside a property. The target list value is inferred from the type of the property, thanks to singletons, and so does not need to be passed in to property. That saves keyboard time and avoids mistakes.

is it worth it?

It's important to consider whether more complicated types are a net benefit. Of course, opinions vary widely on that question in general! But let's consider it in light of my main goals for Propellor:

  1. Help save the user from pushing a broken configuration to their machines at a time when they're down in the trenches dealing with some urgent problem at 3 am.
  2. Advance the state of the art in configuration management by taking advantage of the state of the art in strongly typed haskell.

This change definitely meets both criteria. But there is a tradeoff; it got a little bit harder to write new propellor properties. Not only do new properties need to have their type set to target appropriate systems, but the more polymorphic code is, the more likely the type checker can't figure out all the types without some help.

A simple example of this problem is as follows.

foo :: Property UnixLike
foo = p `requires` bar
  where
    p = property "foo" $ do
        ...

The type checker will complain that "The type variable ‘metatypes1’ is ambiguous". Problem is that it can't infer the type of p because many different types could be combined with the bar property and all would yield a Property UnixLike. The solution is simply to add a type signature like p :: Property UnixLike

Since this only affects creating new properties, and not combining existing properties (which have known types), it seems like a reasonable tradeoff.

things to improve later

There are a few warts that I'm willing to live with for now...

Currently, Property (HasInfo + Debian) is different than Property (Debian + HasInfo), but they should really be considered to be the same type. That is, I need type-level sets, not lists. While there's a type level sets library for hackage, it still seems to require a specific order of the set items when writing down a type signature.

Also, using ensureProperty, which runs one property inside the action of another property, got complicated by the need to pass it a type witness.

foo = Property Debian
foo = property' $ \witness -> do
    ensureProperty witness (aptInstall "foo")

That witness is used to type check that the inner property targets every OS that the outer property targets. I think it might be possible to store the witness in the monad, and have ensureProperty read it, but it might complicate the type of the monad too much, since it would have to be parameterized on the type of the witness.

Oh no, I mentioned monads. While type level lists and type functions and generally bending the type checker to my will is all well and good, I know most readers stop reading at "monad". So, I'll stop writing. ;)

thanks

Thanks to David Miani who answered my first tentative question with a big hunk of example code that got me on the right track.

Also to many other people who answered increasingly esoteric Haskell type system questions.

Also thanks to the Shuttleworth foundation, which funded this work by way of a Flash Grant.

Posted
twenty years of free software -- part 12 propellor

Propellor is my second big Haskell program. I recently described the motivation for it like this, in a proposal for a Linux.Conf.Au talk:

The configuration of Linux hosts has become increasingly declarative, managed by tools like puppet and ansible, or by the composition of containers. But if a server is a collection of declarative properties, how do you make sure that changes to that configuration make sense? You can test them, but eventually it's 3 AM and you have an emergency fix that needs to go live immediately.

Data types to the rescue! While data types are usually used to prevent eg, combining an Int and a Bool, they can be used at a much more abstract level, for example to prevent combining a property that needs a Debian system with a property that needs a Red Hat system.

Propellor leverages Haskell's type system to prove the consistency of the properties it will apply to a host.

The real origin story though, is that I wanted to finally start using configuration management, but the tools for it all seemed very complicated and built on shaky foundations (like piles of yaml), and it seemed it would be easier to write my own than deal with that. Meanwhile, I had Haskell burning a hole in my pocket, ready to be used in a second large project after git-annex.

Propellor has averaged around 2.5 contributions per month from users since it got started, but increasing numbers recently. That's despite having many fewer users than git-annex, which remember gets perhaps 1 patch per month.

Of course, I've "cheated" by making sure that propellor's users know Haskell, or are willing to learn some. And, propellor is very compositional; adding a new property to it is not likely to be complicated by any of the existing code. So it's easy to extend, if you're able to use it.

At this point propellor has a small community of regular contributors, and I spend some pleasant weekend afternoons reviewing and merging their work.

Much of my best work on propellor has involved keeping the behavior of the program the same while making its types better, to prevent mistakes. Propellor's core data types have evolved much more than in any program I worked on before. That's exciting!

Next: twenty years of free software -- part 13 past and future

Posted
twenty years of free software -- part 13 past and future

This series has focused on new projects. I could have said more about significant work that didn't involve starting new projects. A big example was when I added dh to debhelper, which led to changes in a large percentage of debian/rules files. And I've contributed to many other free software projects.

I guess I've focused on new projects becuase it's easier to remember things I've started myself. And because a new project is more wide open, with more scope for interesting ideas, especially when it's free software being created just because I want to, with no expectations of success.

But starting lots of your own projects also makes you responsible for maintaining a lot of stuff. Ten years ago I had dozens of projects that I'd started and was still maintaining. Over time I started pulling away from Debian, with active projects increasingly not connected with it. By the end, I'd left and stopped working on the old projects. Nearly everything from my first decade in free software was passed on to new maintainers. It's good to get out from under old projects and concentrate on new things.

I saved propellor for last in this series, because I think it may point toward the future of my software. While git-annex was the project that taught me Haskell, propellor's design is much more deeply influenced by the Haskell viewpoint.

Absorbing that viewpoint has itself been a big undertaking for me this decade. It's like I was coasting along feeling at the top of my game and wham got hit by the type theory bus. And now I see that I was stuck in a rut before, and begin to get a feeling of many new possibilities.

That's a good feeling to have, twenty years in.

Posted
keysafe alpha release

Keysafe securely backs up a gpg secret key or other short secret to the cloud. But not yet. Today's alpha release only supports storing the data locally, and I still need to finish tuning the argon2 hash difficulties with modern hardware. Other than that, I'm fairly happy with how it's turned out.

Keysafe is written in Haskell, and many of the data types in it keep track of the estimated CPU time needed to create, decrypt, and brute-force them. Running that through a AWS SPOT pricing cost model lets keysafe estimate how much an attacker would need to spend to crack your password.

4.png
(Above is for the password "makesad spindle stick")

If you'd like to be an early adopter, install it like this:

sudo apt-get install haskell-stack libreadline-dev libargon2-0-dev zenity
stack install keysafe

Run ~/.local/bin/keysafe --backup --store-local to back up a gpg key to ~/.keysafe/objects/local/

I still need to tune the argon2 hash difficulty, and I need benchmark data to do so. If you have a top of the line laptop or server class machine that's less than a year old, send me a benchmark:

~/.local/bin/keysafe --benchmark | mail keysafe@joeyh.name -s benchmark

Bonus announcement: http://hackage.haskell.org/package/zxcvbn-c/ is my quick Haskell interface to the C version of the zxcvbn password strength estimation library.

PS: Past 50% of my goal on Patreon!

Posted
PoW bucket bloom: throttling anonymous clients with proof of work, token buckets, and bloom filters

An interesting side problem in keysafe's design is that keysafe servers, which run as tor hidden services, allow anonymous data storage and retrieval. While each object is limited to 64 kb, what's to stop someone from making many requests and using it to store some big files?

The last thing I want is a git-annex keysafe special remote. ;-)

I've done a mash-up of three technologies to solve this, that I think is perhaps somewhat novel. Although it could be entirely old hat, or even entirely broken. (All I know so far is that the code compiles.) It uses proof of work, token buckets, and bloom filters.


Each request can have a proof of work attached to it, which is just a value that, when hashed with a salt, starts with a certain number of 0's. The salt includes the ID of the object being stored or retrieved.

The server maintains a list of token buckets. The first can be accessed without any proof of work, and subsequent ones need progressively more proof of work to be accessed.

Clients will start by making a request without a PoW, and that will often succeed, but when the first token bucket is being drained too fast by other load, the server will reject the request and demand enough proof of work to allow access to the second token bucket. And so on down the line if necessary. At the worst, a client may have to do 8-16 minutes of work to access a keysafe server that is under heavy load, which would not be ideal, but is acceptible for keysafe since it's not run very often.

If the client provides a PoW good enough to allow accessing the last token bucket, the request will be accepted even when that bucket is drained. The client has done plenty of work at this point, so it would be annoying to reject it. To prevent an attacker that is willing to burn CPU from abusing this loophole to flood the server with object stores, the server delays until the last token bucket fills back up.


So far so simple really, but this has a big problem: What prevents a proof of work from being reused? An attacker could generate a single PoW good enough to access all the token buckets, and flood the server with requests using it, and so force everyone else to do excessive amounts of work to use the server.

Guarding against that DOS is where the bloom filters come in. The server generates a random request ID, which has to be included in the PoW salt and sent back by the client along with the PoW. The request ID is added to a bloom filter, which the server can use to check if the client is providing a request ID that it knows about. And a second bloom filter is used to check if a request ID has been used by a client before, which prevents the DOS.

Of course, when dealing with bloom filters, it's important to consider what happens when there's a rare false positive match. This is not a problem with the first bloom filter, because a false positive only lets some made-up request ID be used. A false positive in the second bloom filter will cause the server to reject the client's proof of work. But the server can just request more work, or send a new request ID, and the client will follow along.

The other gotcha with bloom filters is that filling them up too far sets too many bits, and so false positive rates go up. To deal with this, keysafe just keeps count of how many request IDs it has generated, and once it gets to be too many to fit in a bloom filter, it makes a new, empty bloom filter and starts storing request IDs in it. The old bloom filter is still checked too, providing a grace period for old request IDs to be used. Using bloom filters that occupy around 32 mb of RAM, this rotation only has to be done every million requests of so.

But, that rotation opens up another DOS! An attacker could cause lots of request IDs to be generated, and so force the server to rotate its bloom filters too quickly, which would prevent any requests from being accepted. To solve this DOS, just use one more token bucket, to limit the rate that request IDs can be generated, so that the time it would take an attacker to force a bloom filter rotation is long enough that any client will have plenty of time to complete its proof of work.


This sounds complicated, and probably it is, but the implementation only took 333 lines of code. About the same number of lines that it took to implement the entire keysafe HTTP client and server using the amazing servant library.

There are a number of knobs that may need to be tuned to dial it in, including the size of the token buckets, their refill rate, the size of the bloom filters, and the number of argon2 iterations in the proof of work. Servers may eventually need to adjust those on the fly, so that if someone decides it's worth burning large quantities of CPU to abuse keysafe for general data storage, the server throttles down to a rate that will take a very long time to fill up its disk.

This protects against DOS attacks that fill up the keysafe server storage. It does not prevent a determined attacker, who has lots of CPU to burn, from flooding so many requests that legitimate clients are forced to do an expensive proof of work and then time out waiting for the server. But that's an expensive attack to keep running, and the proof of work can be adjusted to make it increasingly expensive.

Posted