the single most important criteria when replacing Github

I could write a lot of things about the Github acquisition by Microsoft. About Github's embrace and extend of git, and how it passed unnoticed by people who now fear the same thing now that Microsoft is in the picture. About the stultifying effects of Github's centralization, and its retardant effect on general innovation in spaces around git and software development infrastructure.

Instead I'd rather highlight one simple criteria you can consider when you are evaluating any git hosting service, whether it's Gitlab or something self-hosted, or federated, or P2P[1], or whatever:

Consider all the data that's used to provide the value-added features on top of git. Issue tracking, wikis, notes in commits, lists of forks, pull requests, access controls, hooks, other configuration, etc.
Is that data stored in a git repository?

Github avoids doing that and there's a good reason why: By keeping this data in their own database, they lock you into the service. Consider if Github issues had been stored in a git repository next to the code. Anyone could quickly and easily clone the issue data, consume it, write alternative issue tracking interfaces, which then start accepting git pushes of issue updates and syncing all around. That would have quickly became the de-facto distributed issue tracking data format.

Instead, Github stuck it in a database, with a rate-limited API, and while this probably had as much to do with expediency, and a certain centralized mindset, as intentional lock-in at first, it's now become such good lock-in that Microsoft felt Github was worth $7 billion.

So, if whatever thing you're looking at instead of Github doesn't do this, it's at worst hoping to emulate that, or at best it's neglecting an opportunity to get us out of the trap we now find ourselves in.


[1] Although in the case of a P2P system which uses a distributed data structure, that can have many of the same benefits as using git. So, git-ssb, which stores issues etc as ssb messages, is just as good, for example.

Posted
two security holes and a new library

For the past week and a half, I've been working on embargoed security holes. The embargo is over, and git-annex 6.20180626 has been released, fixing those holes. I'm also announcing a new Haskell library, http-client-restricted, which could be used to avoid similar problems in other programs.

Working in secret under a security embargo is mostly new to me, and I mostly don't like it, but it seems to have been the right call in this case. The first security hole I found in git-annex turned out to have a wider impact, affecting code in git-annex plugins (aka external special remotes) that uses HTTP. And quite likely beyond git-annex to unrelated programs, but I'll let their developers talk about that. So quite a lot of people were involved in this behind the scenes.

See also: The RESTLESS Vulnerability: Non-Browser Based Cross-Domain HTTP Request Attacks

And then there was the second security hole in git-annex, which took several days to notice, in collaboration with Daniel Dent. That one's potentially very nasty, allowing decryption of arbitrary gpg-encrypted files, although exploiting it would be hard. It logically followed from the first security hole, so it's good that the first security hole was under embagro long enough for us to think it all though.

These security holes involved HTTP servers doing things to exploit clients that connect to them. For example, a HTTP server that a client asks for the content of a file stored on it can redirect to a file:// on the client's disk, or to http://localhost/ or a private web server on the client's internal network. Once the client is tricked into downloading such private data, the confusion can result in private data being exposed. See the_advisory for details.

Fixing this kind of security hole is not necessarily easy, because we use HTTP libraries, often via an API library, which may not give much control over following redirects. DNS rebinding attacks can be used to defeat security checks, if the HTTP library doesn't expose the IP address it's connecting to.

I faced this problem in git-annex's use of the Haskell http-client library. So I had to write a new library, http-client-restricted. Thanks to the good design of the http-client library, particularly its Manager abstraction, my library extends it rather than needing to replace it, and can be used with any API library built on top of http-client.

I get the impression that a lot of other language's HTTP libraries need to have similar things developed. Much like web browsers need to enforce same-origin policies, HTTP clients need to be able to reject certain redirects according to the security needs of the program using them.

I kept a private journal while working on these security holes, and am publishing it now:

Posted