So, sha-1 is looking increasingly insecure in applications where birthday attacks are possible. ("Birthday attacks" ... what a phrase ... I hope my non-technical readers stopped at "sha-1".)

Two things about that:


First, I wanted to mention that I've today released jetring 0.15, which adds support for arbitrary hashes in the index file, and deprecates use of sha-1, going to sha-256 by default. There is a jetring-checksum -u utility that can be used to upgrade sha-1 hashes in existing jetring index files.

If you're using jetring in an application where changesets are provided by third paries, then a birthday attack could be possible (though not easy?), and you should upgrade your index. debian-maintainer is a good example of such a jetring user.


Secondly, our beloved git uses sha-1, and this seems unlikely to change soon or without significant pain. So, what kinds of collision attacks would you need to watch out for when using git? Here is a real-world example I've been pondering. Is it accurate?

  • Alice creates a legitimate new version of a file in the linux kernel.
  • Alice uses the new 252 work to generate two variants of the file that sha-1 the same. One is suitable for public consumption, and one does something nasty. (Note that this is still gonna be very hard to accomplish for peer-reviewed source code. (Maybe the best file to patch would be one containing firmware?) Also note that the collision actually needs to occur on the data that git-hash-object(1) will hash for the file.)
  • From the two variants of the file, Alice can generate two patches, a good and a bad. The good version is sent to Linus. Note that the sha-1's of the patches will not be the same, but when applied to a git repo, both patches will generate versions of the original file that sha-1 the same, despite being different.
  • Linus accepts the patch and publishes it in his git repo, and tags a new release. His repo now contains the good variant of the file.
  • Alice sets up her own git repo, a clone of Linus's, and tweaks it to contain the bad version of the file.
  • Alice lets the world know about her git repo, and encourages people pull from it before pulling from Linus, to save him bandwidth, or for some other plausible reason. (May seem unlikly, but people actually do this in many scenarios in the git world.)
  • Bob pulls from Alice's git repo, then pulls from Linus, and then builds the kernel, from Linus's tag. Git gets the bad version of Alice's file from her repo, and its sha-1 is ok. Alice has succeeded in deploying her evil code.

Here's a different scenario..

  • Say that I have commit access to the firmware-nonfree git repository. Let's also suppose that releases of firmware-nonfree are built by a build server that clones the git repository, and builds from it.
  • I take a new firmware file, and use birthday attacks to generate two variants of it, one good and one bad, that git-hash-object will generate the same sha-1 for.
  • I commit the good one to git, ensuring that the commit appears to have been made by a contributor who is on vacation, not me. I push it to the master git repository, and wait for my co-developers to pull it, test it out, etc.
  • When a release of firmware-nonfree is immenant, I ssh in and modify the object in the master git repo, replacing it with the bad version of the firmware. Since all the developers have pulled the good version already, they are unlikely to notice this change.
  • The build server is kicked off, clones the repository, including the bad firmware, checks the gpg signature on the release tag, and deploys my evil code.
  • I ssh back in and cover my traces, changing the object back to the good version.

This seems more plausible. This sort of attack is easy to accomplish with a subversion repository, and was one of the reasons I was glad to switch to git, since its checksums and signed tags seemed to prevent this kind of mischief. So, worrying, especially if your project uses such a build server.

Update: Thanks to the commenters for helping me correct my example. (I hope!)

Git, files with same sha-1
The first thing that strikes my mind when thinking about git and sha-1 is not security (that's only #2), but whether git will break when the user wants to keep two files with the same checksum under version control. If I was a security researcher and stuyding sha-1 birthday attacks, I'd want to keep the files under version control...
Comment by liw.fi
comment 2

It shouldn't matter if your research files have the same sha-1, because you'd commit file A and B with different commit messages, parents, and commit times. Since git checksums all of those, in addition to the file content, it should not care.

There might potentially be problems with anything in the working copy/index code that looks at sha1's of files getting confused if you replace file A with B, and thinking you haven't modified it. But once they're committed, the checksums will differ.

Comment by joey [kitenet.net]
comment 3
An attacker doesn't need to worry about commit objects. Each blob is identified by a SHA1 of its contents only, not even including its filename or permissions - the latter are part of the tree object. The security researcher case is a classic example of how pure hash-based content addressing schemes fail, including git's.
Comment by fanf [livejournal.com]
@c1 and c2
Since git first stores the raw file contents as blobs, addressed only by the content sha-1 (Which is something like SHA1(TYPE+LENGHT+BYTESTRING)). Since the colliding files may collide in the bytestrings, but possibly don't collide with the git prefix (which might be only 2-4 bytes long), the files are probably safe for storage with git. However, if the blob id WOULD collide, git would conclude the files were identical, and only one version of the blob and thus the file contents would have been stored. ulrik
Comment by kaizer.se
@c2
Basically commit ID (commit sha-1) does not matter, since commits only reference trees which in turn reference blobs by id. The collisions in git are possible for blobs, but only if they collide using git-hash-object, which I wanted to say in the last comment.
Comment by kaizer.se
birthday attack?

hi, i am not familiar with 'the birthday attack on sha-1'.

if i assume the nature of the birthday-attack right, it is based upon the probability of having two random pieces of data hash to the same value.

how could you use that to produce an piece of data which hashvalue matches that of an determined piece of data? isn't that another problem-set?

Comment by dmk [getopenid.com]
birthday attack!

ah! i just got it, rereading your blogentry.

you have the good file and a bad file. and then you modify both files randomly (via a comment or somesuch) until you get the same sha-1.

i can imagine smth like this working. but of course the 252 is probably still only theoretically broken? don't know what effort such an attack would take. (like how much "1000 dollar worth of hardware running one day")

Comment by dmk [getopenid.com]
2^52

This was posted about the 252 on ietf-openpgp

Just to give you some perspective what WFO means at this day and age: my cryptography lab at the University has just built and tested a DES cracker that cost us less than €20000 EUR. It iterates through the 56-bit key space in about one week. We are considering using it for finding a SHA1 collision using these new results. But, as noted above, this would be a collision where both pre-images are carefully chosen by the attacker.
Comment by joey [kitenet.net]