The kernel.org compromise has people talking about the security of git's use of sha1. Talking about this is a good thing, I think, but there's a lot of smug "we're cryptographically secure" in the air that does not seem warranted coming from non-cryptographers like me.

Two years ago I had a discussion on my blog about git and sha1, that reached similar conclusions to what I'm seeing here: It seems that current known sha1 attacks require somehow getting an ugly colliding binary file accepted into the repository in the first place. Hard to manage for peer reviewed source code. We all hate firmware in the kernel, so perhaps this is another reason it's bad. ;-) Etc.

Well, not so fast. Git's exposure to sha1 collisions is broader than just files. Git also stores data for commits, and directory trees.

Git's tree objects are interesting because they're a bag of bytes that is rarely if ever manually examined. If there was a way to exploit git such that it ignored some trailing garbage at the end of a tree object, then here's an attack injection vector that would be unlikely to be caught by peer review.

If you can change the content of a tree without changing its sha1, you can simply make it link to an older version of a file that had an exploitable problem. Or you can assemble a combination of files that results in an new exploitable problem. (For example, suppose a buffer size was hardcoded in two files in the kernel, and then the size was changed in both -- make a tree that contains one change and not the other.)

Now, git's tree-walk code, until 2008, mishandled malformed data by accessing memory outside the tree buffer. Was this an exploitable bug in git? I don't know. It is interesting that the fix, in 64cc1c0909949fa2866ad71ad2d1ab7ccaa673d9 relied on the parser stopping at a NULL -- great if you want to put some garbage after the tree's filename. With that said, the particular exploit I describe above probably won't work -- I tried! Here's all the code that stands between us and this exploit:

        if (size < 24 || buf[size - 21])
                die("corrupt tree file");
        
        path = get_mode(buf, &mode);
        if (!path || !*path)
                die("corrupt tree file");

Any good C programmer would recognise that this magic-constant-laden code needs to be careful about the size of the buffer. It's not as clear though, that it needs to be careful about consuming the entire contents of the buffer. And C programmers involved with git have gotten this code wrong before.

tldr: If git is a castle, it was built just after cannons were invented, and we've had our fingers in our ears for several years as their power improved. Now the outer wall of sha1 is looking increasingly like one of straw, and we're down to a rather thin inner wall of C code.