size of the git sha1 collision attack surface

The compromise has people talking about the security of git's use of sha1. Talking about this is a good thing, I think, but there's a lot of smug "we're cryptographically secure" in the air that does not seem warranted coming from non-cryptographers like me.

Two years ago I had a discussion on my blog about git and sha1, that reached similar conclusions to what I'm seeing here: It seems that current known sha1 attacks require somehow getting an ugly colliding binary file accepted into the repository in the first place. Hard to manage for peer reviewed source code. We all hate firmware in the kernel, so perhaps this is another reason it's bad. ;-) Etc.

Well, not so fast. Git's exposure to sha1 collisions is broader than just files. Git also stores data for commits, and directory trees.

Git's tree objects are interesting because they're a bag of bytes that is rarely if ever manually examined. If there was a way to exploit git such that it ignored some trailing garbage at the end of a tree object, then here's an attack injection vector that would be unlikely to be caught by peer review.

If you can change the content of a tree without changing its sha1, you can simply make it link to an older version of a file that had an exploitable problem. Or you can assemble a combination of files that results in an new exploitable problem. (For example, suppose a buffer size was hardcoded in two files in the kernel, and then the size was changed in both -- make a tree that contains one change and not the other.)

Now, git's tree-walk code, until 2008, mishandled malformed data by accessing memory outside the tree buffer. Was this an exploitable bug in git? I don't know. It is interesting that the fix, in 64cc1c0909949fa2866ad71ad2d1ab7ccaa673d9 relied on the parser stopping at a NULL -- great if you want to put some garbage after the tree's filename. With that said, the particular exploit I describe above probably won't work -- I tried! Here's all the code that stands between us and this exploit:

        if (size < 24 || buf[size - 21])
                die("corrupt tree file");
        path = get_mode(buf, &mode);
        if (!path || !*path)
                die("corrupt tree file");

Any good C programmer would recognise that this magic-constant-laden code needs to be careful about the size of the buffer. It's not as clear though, that it needs to be careful about consuming the entire contents of the buffer. And C programmers involved with git have gotten this code wrong before.

tldr: If git is a castle, it was built just after cannons were invented, and we've had our fingers in our ears for several years as their power improved. Now the outer wall of sha1 is looking increasingly like one of straw, and we're down to a rather thin inner wall of C code.


This was the second most rainy day of the year, so far. 28 hours of solid rain and counting. Good time to take stock of the water situation here at the cabin after a year's experience.

The springs were as seasonal as I'd feared. After running strongly all Spring, the main spring ebbed and died in the Summer months. By mid-July there was no remaining source of potable water. If I had not been away so much over the summer it would have been worse. As it was, I needed only twenty gallons of water hauled from neighbor Lee's well.

The large cistern was filled in a single night during the most rainy period this Spring, and provided wash water for two months of the summer, and that only lowered it by a quarter. So I could have been less sparing with it, and not relied so much on rain water catchment for supplimental wash water.

Today I replaced the pump, so I can use the pressure tank and have running water in the house. I could have done this earlier, but what can I say; I enjoy hauling buckets of water, when it doesn't involve breaking ice. In the winter when there's firewood to haul instead, I'll enjoy the indoor plumbing.

Also today, I may have finally fixed the pipe to the large cistern properly, so it won't only fill at the heaviest rain times. Along with the ability to cross-pump water from the small to the large cistern, it should be easier to keep it filled, and I might be able to reduce the turbidity enough that it's potable eventually. Will see.

I don't live in a dry region, but this is a relatively dry place for the area. It's been interesting to get a taste of what it'd be like to live somewhere where water is actually scarce.

watch me program for half an hour

In this screencast, I implement a new feature in git-annex. I spend around 10 minutes writing haskell code, 10 minutes staring at type errors, and 10 minutes writing documentation. A normal coding session for me. I give a play-by-play, and some thoughts of what programming is like for me these days.

git-annex coding in haskell.ogg (38 MB) | on vimeo

Not shown is the hour I spent the next day changing the "optimize" subcommand implemented here into "--auto" options that can be passed to git-annex's get and drop commands.

watched it all, liked it (54%)

watched some, boring (9%)

too long for me (4%)

too haskell for me (17%)

not interested (13%)

Total votes: 86
destructive impulse

I used to know guys who would take old computer hardware out in the desert and shoot it up. I never went, and I've never gratuitously destroyed a computer in 2 decades of working with the things. Until yesterday when I picked one up by the monitor and introduced it to the floor.

My regret isn't that I violently destroyed a computer, but that it was my Mom's computer, and I did it right in front of her, and without even a story-worthy reason, just out of garden variety frustration. And perhaps there's some regret that I was actually unsucessful in destroying anything other than the monitor and some speakers. The computer was salvaged, and my mom has a new monitor and a working system again.

Sorry Mom & Maggie. To make up for it, I'll humiliate myself with this recording of "broke mom's computer blues", which will only be available for a very limited time: click to listen (warning: contains harmonica).

Anyway, don't worry; any remaining frustration will be taken out the usual way: Splitting firewood.

happy haskell hacker

There are certian things haskell is very good at, and I had the pleasure of using it for one such thing yesterday. I wanted to support expressions like find(1) does, in git-annex. Something like:

git-annex drop --not --exclude '*.mp3' --and \
    -\( --in usbdrive --or --in archive -\) --and \
    --not --copies 3

So, parens and booleans and some kind of domain-specific operations. It's easy to build a data structure in haskell that can contain this sort of expression.

{- A Token can either be a single word, or an Operation of an arbitrary type. -}
data Token op = Token String | Operation op
        deriving (Show, Eq)

data Matcher op = Any
        | And (Matcher op) (Matcher op)
        | Or (Matcher op) (Matcher op)
        | Not (Matcher op)
        | Op op
        deriving (Show, Eq)

(The op could just be a String, but is parameterized for reasons we'll see later.)

As command-line options, the expression is already tokenised, so all I needed to do to parse it is consume a list of the Tokens. The only mildly tricky thing is handling the parens right -- I chose to not make it worry if there were too many, or too few closing parens.

generate :: [Token op] -> Matcher op
generate ts = generate' Any ts
generate' :: Matcher op -> [Token op] -> Matcher op
generate' m [] = m
generate' m ts = uncurry generate' $ consume m ts

consume :: Matcher op -> [Token op] -> (Matcher op, [Token op])
consume m [] = (m, [])
consume m ((Operation o):ts) = (m `And` Op o, ts)
consume m ((Token t):ts)
        | t == "and" = cont $ m `And` next
        | t == "or" = cont $ m `Or` next
        | t == "not" = cont $ m `And` (Not next)
        | t == "(" = let (n, r) = consume next rest in (m `And` n, r)
        | t == ")" = (m, ts)
        | otherwise = error $ "unknown token " ++ t
                (next, rest) = consume Any ts
                cont v = (v, rest)

Once a Matcher is built, it can be used to check if things match the expression the user supplied. This next bit of code almost writes itself.

{- Checks if a Matcher matches, using a supplied function to check
 - the value of Operations. -}
match :: (op -> v -> Bool) -> Matcher op -> v -> Bool
match a m v = go m
                go Any = True
                go (And m1 m2) = go m1 && go m2
                go (Or m1 m2) = go m1 || go m2
                go (Not m1) = not (go m1)
                go (Op o) = a o v

And that's it! This is all nearly completly generic and could be used for a great many things that need support for this sort of expression, as long as they can be checked in pure code.

A trivial example:

*Utility.Matcher> let m = generate [Operation True, Token "and", Token "(", Operation False, Token "or", Token, "not", Operation False, Token ")"]
*Utility.Matcher> match (const . id) m undefined 

For my case though, I needed to run some IO actions to check if expressions about files were true. This is where I was very pleased to see a monadic version of match could easily be built.

{- Runs a monadic Matcher, where Operations are actions in the monad. -}
matchM :: Monad m => Matcher (v -> m Bool) -> v -> m Bool
matchM m v = go m
                go Any = return True
                go (And m1 m2) = liftM2 (&&) (go m1) (go m2)
                go (Or m1 m2) =  liftM2 (||) (go m1) (go m2)
                go (Not m1) = liftM not (go m1)
                go (Op o) = o v

With this and about 100 lines of code to implement specific tests like --copies and --in, git-annex now supports the example at the top.

Just for comparison, find(1) has thousands of lines of C code to build a similar parse tree from the command line parameters and run it. Although I was surprised to see that it optimises expressions by eg, reordering cheaper tests first.

The last time I wrote this kind of thing was in perl, and there the natural way was to carefully translate the expression into perl code, which was then evaled. Meaning the code was susceptible to security holes.

Anyway, this is nothing that has not been done a hundred times in haskell before, but it's very nice that it makes it so clean, easy, and generic.

borrowed dogs

I've never had a dog of my own since I grew up, but there have always been dogs in my life.

Calypso and Percy

Calypso was dropped off at Wortroot soon after I moved in, and was my borrowed dog for years. And she remained out there, spending a good decade with run of the woods, fields and streams, a good doggy life. She got old and feeble, spent winters by the fireplace, and finally it was too much for her. I'll miss her, the best dog I've known.

Recently I've caught glimpses of a dog lurking in the distance here at the Hollow. When I noticed it was sleeping on the roof of the battery box, I realized it was probably one of the dogs that used to live here but were given away last year. Exchanged email with the likely owners, now in Sudan, and they tell me her name is Domino, and she must have run away home.

So I've been putting out food for Domino this week, and yesterday she came close enough to be petted. Medium sized and white, her name is for a black mask extending from eyes to ears. Although currently skittish, she seems basically a good, calm dog.