case study: adding box.com support to git-annex

git-annex has special remotes that allow large files checked into git to be stored in arbitrary places, that are not proper git remotes. One key use of the special remotes is to store files in The Cloud.

Until now the flagship special remote used Amazon S3, although a few other things like Archive.org, rsync.net, and Tahoe-Laffs can be made to work too. One of my goals is to add as many cloud storage options to git-annex as possible.

Box.com came to my attention because they currently have a promotion that provides 50 gigabytes of free "lifetime" service. Which is a nice amount of cloud storage to have for free. I decided that I didn't want to spend more than 4 hours of my time to make git-annex use it though. (I probably have spent a week on the S3 support by contrast.)

So, this is a case study in quickly adding support for one cloud storage provider to git-annex.

  • First, I had to sign up to box.com. Their promotion requires an android phone be used to get the 50 gigabytres. This wasted about an hour getting my unused phone dusted off etc. This also includes time spent researching ways to access box.com's storage, including reading their API documentation. I found it has a WebDAV interface.
  • Sadly, there is not yet a native WebDAV library for haskell. This is a shame, because it would make the implementation better. But, I'm confident someone will eventually write one. My experience with haskell libraries for other web APIs (S3, GitHub) is that it's an excellent language to write them in, the code tends to be very simple, concise and clear. But I can't do it in 4 hours. So for now, the workaround is to use a WebDAV mounting tool. I picked davfs2 as it was the first one I got to work with box.com's slightly broken WebDAV. 2 hours spent now.
  • With box.com mounted, I was neary done; git-annex's directory special remote can use the mount point. But there was a catch: box.com only allows up to 100 mb large files. I spent 1 hour or so adding support to the directory special remote for chunking files into a user-specified size.
    This was a fairly complex problem -- the existing code had a ByteString that when accessed lazily read the whole large file (from disk or from gpg, depending), and just called writeFile on it.
    I needed to still consume it lazily to avoid reading the whole file into memory, but write out chunks. This gets a bit into haskell's ByteString internals, but they're very well suited to this kind of thing, and so after 15 minutes familiarizing myself with the data structures, it was actually fairly easy to write the code. patch
  • I spent my last hour testing and tuning the box.com special remote. Using davfs2 as a quick fix caused some technical debt that I had to make up for. In particular, the chunked filename retrieval code had to make sure not to open every chunk at once, because that makes davfs2 try to cache them all, instead of streaming one at a time. patch
  • Not counted toward my 4 hour limit is the ... er ... 4 hours I spent last night adding a progress bar to the directory special remote. A progress display while transferring the files makes using box.com as a special remote much nicer, but also makes using my phone's SD card as a special remote much nicer! This is why I'm a poor consultant -- when faced with something generic and generally useful like this, I have difficulty billing for it.

The end result is that there are detailed instructions for using box.com as a special remote.

And it seems to work quite well now. I just set up my production box.com special remote. All content written to it is gpg encrypted, and various of my computers have access to it, each using their own gpg key to decrypt the files uploaded by the others. (git-annex's encryption feature makes this work really well!)

So..
There is a DropBox API for haskell. But as I'm not a customer, the 2 gb free account hardly makes it worth my while to make git-annex use it. Would someone like to fund my time to add a dropbox special remote to git-annex?

podcasts that don't suck

My public radio station is engaged in a most obnoxious spring pledge drive. Good time to listen to podcasts. Here are the ones I'm currently liking.

  • Free As In Freedom: The best informed podcast on software licensing issues, and highly idealistic. What keeps me coming back, though is that Karen and Bradley never quite agree on things, and always end up in some lawyerly minutia culdesac that is somehow interesting to listen to. They once did a whole show about a particular IRS tax form, and I listened to it all. (Granted, I often listen to this while cleaning house, but as Bradley would say, at least I'm not listening to it while driving.)

  • This Developer's Life: At least the early episodes before it got popular are a unashamed imitation of This American Life, and I have quite enjoyed them. Although I often roll my eyes at the proprietary developer mindsets on display in the show. For example, often they'll have a bug and not root cause it, because well, they don't have the source code for the Windows layers. Still, beneath that it's mostly about the parts of software development that are common to all our lives. A particular episode I can recommend is #10 "Disconnecting" -- the first 20 minutes is a perfect story.

  • Off the Hook: This is actually a live radio show, quite well done, with call-ins and everything. So much more polished than your typical podcast. It's hosted by Emmanuel Goldstein! And it's been going on for over 20 years, so why did I never hear about it before? Probably I'm not quite in the right hacker circles. Since it's out of NYC and very anti-authoritarian, I've mostly been enjoying it as a view into the Occupy protests.

  • StarShipSofa: The best science fiction podcast around. Probably not news to anyone who ever looked for such a podcast. Long, and tends to be frontloaded with a lot of administrivia, which I fast-forward to get to the stories.

  • Spider on the Web: The best music and science fiction podcast around. Mostly on hiatus since Jeanne died, but I hope Spider picks it back up. A good examplar is "Bianca's Hands"

  • Long Now Seminars: Consistently interesting. I visited their space last time I was in SF only to learn they'd had a talk the night before, which would have been a bummer, except they ran the bits of the Clock for us.

  • Linux Outlaws: After 18 years using Linux, I find the level of discourse in most Linux podcasts typically rather annoying. Including this one, but when Fab gets on a rant, it's all worth it. Sometimes some interesting guests.

  • This Week In Debian: Sadly no new episodes lately, and I've been too lame to respond to repeated interview requests. Probably it needs to move away from being an interview show if it is to continue; there are only so many DD's who can give excellent interviews like liw did.

Posted