So, sha-1 is looking increasingly insecure in applications where birthday attacks are possible. ("Birthday attacks" ... what a phrase ... I hope my non-technical readers stopped at "sha-1".)
Two things about that:
First, I wanted to mention that I've today released jetring 0.15,
which adds support for arbitrary hashes in the index file, and deprecates
use of sha-1, going to sha-256 by default. There is a jetring-checksum -u
utility that can be used to upgrade sha-1 hashes in existing jetring index
files.
If you're using jetring in an application where changesets are provided by third paries, then a birthday attack could be possible (though not easy?), and you should upgrade your index. debian-maintainer is a good example of such a jetring user.
Secondly, our beloved git uses sha-1, and this seems unlikely to change soon or without significant pain. So, what kinds of collision attacks would you need to watch out for when using git? Here is a real-world example I've been pondering. Is it accurate?
- Alice creates a legitimate new version of a file in the linux kernel.
- Alice uses the new 252 work to generate two variants of the file
that sha-1 the same. One is suitable for public consumption, and one
does something nasty. (Note that this is still gonna be very hard
to accomplish for peer-reviewed source code. (Maybe the best file to patch
would be one containing firmware?) Also note that the collision actually
needs to occur on the data that
git-hash-object(1)
will hash for the file.) - From the two variants of the file, Alice can generate two patches, a good and a bad. The good version is sent to Linus. Note that the sha-1's of the patches will not be the same, but when applied to a git repo, both patches will generate versions of the original file that sha-1 the same, despite being different.
- Linus accepts the patch and publishes it in his git repo, and tags a new release. His repo now contains the good variant of the file.
- Alice sets up her own git repo, a clone of Linus's, and tweaks it to contain the bad version of the file.
- Alice lets the world know about her git repo, and encourages people pull from it before pulling from Linus, to save him bandwidth, or for some other plausible reason. (May seem unlikly, but people actually do this in many scenarios in the git world.)
- Bob pulls from Alice's git repo, then pulls from Linus, and then builds the kernel, from Linus's tag. Git gets the bad version of Alice's file from her repo, and its sha-1 is ok. Alice has succeeded in deploying her evil code.
Here's a different scenario..
- Say that I have commit access to the
firmware-nonfree
git repository. Let's also suppose that releases offirmware-nonfree
are built by a build server that clones the git repository, and builds from it. - I take a new firmware file, and use birthday attacks to generate two
variants of it, one good and one bad, that
git-hash-object
will generate the same sha-1 for. - I commit the good one to git, ensuring that the commit appears to have been made by a contributor who is on vacation, not me. I push it to the master git repository, and wait for my co-developers to pull it, test it out, etc.
- When a release of
firmware-nonfree
is immenant, I ssh in and modify the object in the master git repo, replacing it with the bad version of the firmware. Since all the developers have pulled the good version already, they are unlikely to notice this change. - The build server is kicked off, clones the repository, including the bad firmware, checks the gpg signature on the release tag, and deploys my evil code.
- I ssh back in and cover my traces, changing the object back to the good version.
This seems more plausible. This sort of attack is easy to accomplish with a subversion repository, and was one of the reasons I was glad to switch to git, since its checksums and signed tags seemed to prevent this kind of mischief. So, worrying, especially if your project uses such a build server.
Update: Thanks to the commenters for helping me correct my example. (I hope!)
It shouldn't matter if your research files have the same sha-1, because you'd commit file A and B with different commit messages, parents, and commit times. Since git checksums all of those, in addition to the file content, it should not care.
There might potentially be problems with anything in the working copy/index code that looks at sha1's of files getting confused if you replace file A with B, and thinking you haven't modified it. But once they're committed, the checksums will differ.
SHA1(TYPE+LENGHT+BYTESTRING)
). Since the colliding files may collide in the bytestrings, but possibly don't collide with the git prefix (which might be only 2-4 bytes long), the files are probably safe for storage with git. However, if the blob id WOULD collide, git would conclude the files were identical, and only one version of the blob and thus the file contents would have been stored. ulrikhi, i am not familiar with 'the birthday attack on sha-1'.
if i assume the nature of the birthday-attack right, it is based upon the probability of having two random pieces of data hash to the same value.
how could you use that to produce an piece of data which hashvalue matches that of an determined piece of data? isn't that another problem-set?
ah! i just got it, rereading your blogentry.
you have the good file and a bad file. and then you modify both files randomly (via a comment or somesuch) until you get the same sha-1.
i can imagine smth like this working. but of course the 252 is probably still only theoretically broken? don't know what effort such an attack would take. (like how much "1000 dollar worth of hardware running one day")
This was posted about the 252 on ietf-openpgp