The kernel.org compromise has people talking about the security of git's use of sha1. Talking about this is a good thing, I think, but there's a lot of smug "we're cryptographically secure" in the air that does not seem warranted coming from non-cryptographers like me.

Two years ago I had a discussion on my blog about git and sha1, that reached similar conclusions to what I'm seeing here: It seems that current known sha1 attacks require somehow getting an ugly colliding binary file accepted into the repository in the first place. Hard to manage for peer reviewed source code. We all hate firmware in the kernel, so perhaps this is another reason it's bad. ;-) Etc.

Well, not so fast. Git's exposure to sha1 collisions is broader than just files. Git also stores data for commits, and directory trees.

Git's tree objects are interesting because they're a bag of bytes that is rarely if ever manually examined. If there was a way to exploit git such that it ignored some trailing garbage at the end of a tree object, then here's an attack injection vector that would be unlikely to be caught by peer review.

If you can change the content of a tree without changing its sha1, you can simply make it link to an older version of a file that had an exploitable problem. Or you can assemble a combination of files that results in an new exploitable problem. (For example, suppose a buffer size was hardcoded in two files in the kernel, and then the size was changed in both -- make a tree that contains one change and not the other.)

Now, git's tree-walk code, until 2008, mishandled malformed data by accessing memory outside the tree buffer. Was this an exploitable bug in git? I don't know. It is interesting that the fix, in 64cc1c0909949fa2866ad71ad2d1ab7ccaa673d9 relied on the parser stopping at a NULL -- great if you want to put some garbage after the tree's filename. With that said, the particular exploit I describe above probably won't work -- I tried! Here's all the code that stands between us and this exploit:

        if (size < 24 || buf[size - 21])
                die("corrupt tree file");
        
        path = get_mode(buf, &mode);
        if (!path || !*path)
                die("corrupt tree file");

Any good C programmer would recognise that this magic-constant-laden code needs to be careful about the size of the buffer. It's not as clear though, that it needs to be careful about consuming the entire contents of the buffer. And C programmers involved with git have gotten this code wrong before.

tldr: If git is a castle, it was built just after cannons were invented, and we've had our fingers in our ears for several years as their power improved. Now the outer wall of sha1 is looking increasingly like one of straw, and we're down to a rather thin inner wall of C code.

commit messages can hide data

The commit message can in theory contain arbitrary binary bytes, and git is aware of the total size. However, in practice, the code makes sure all objects are NUL-terminated, and many pieces of code will treat the commit message as a string.

So you can do something like:

 git init repo &&
 cd repo &&
 >file && git add file && git commit -m foo &&
 new_commit=`git cat-file commit HEAD |
             perl -pe 's/foo\n/$&\x00secret content/' |
             git hash-object -w --stdin -t commit` &&
 git update-ref refs/heads/master $new_commit

The resulting commit will appear in "git log" to have commit message "foo". But if you use "git cat-file", you will see that it carries the secret content.

So you could find a collision between a decoy and a malicious commit, ask somebody to pull the decoy, and then substitute the malicious one (which presumably points to a different tree with evil code in it).

Comment by Jeff
ah, colliding commits

I had skipped over commits because it seemed harder to exploit a collision in a commit and because I was sure git just displayed the whole thing. More fool I. Jeff's approach works.

So, how can a colliding commit be exploited? Well, you just need to change it to point to an exploitable tree. Any git repository probably already has such a tree in it somewhere. Annoyingly, the tree probably claims to be some ancient version of the program with a version number like 2.6.xx, which is clearly not 3.0, and so this attack would be pretty obvious in effect.

Still, the walls cruble further..

Comment by joey
comment 3

Ah, I have a better exploit for colliding commits

  1. Find security hole in kernel.
  2. Send Linus a pull request for a fix, quietly
  3. At a later date, replace the commit with the colliding one you generated at the same time you generated the original fix. The colliding commit points at the unfixed tree. So this essentially changes history; so the hole was never fixed.

This still has its problems. You have to have a 0-day now, to get a 0-day later, and it's the same 0-day. Not sure what benefit such a 0-day bank would have.

This would also work better against a project other than linux. One that did not add Signed-Off-By lines to commits (which could be worked around by generating the pair of colliding commits with Linus's signoff already in it). Also one where pathes were not typically sent via email, but were pulled from your git repository.

Comment by joey
colliding commit attack take 2
Here's a way. Make a patch that adds a new and improved security check, in a commit that includes a NULL and sha1 collision material tacked on. In another, entirely legitimate commit, a patsy can remove old security checks, that your new code obsoletes. Now you have the ability to replace your commit with the colliding one, which simply points at the tree before you added the security check, thus leaving the kernel vulnerable.
Comment by joey
comment 5

The problem with the above attack is that it can change a historic tree to one containing your exploit, but it does not change the content of the head of the tree. (Even when a fresh checkout is made of it.)

This attack would have to be used in a way that trusts the old content of the tree -- ie, a diff, a merge, a bisection, and "infects" the head with the bad code. That comes back to social engineering.

Comment by joey