git-annex has special remotes that allow large files checked into git to be stored in arbitrary places, that are not proper git remotes. One key use of the special remotes is to store files in The Cloud.
Until now the flagship special remote used Amazon S3, although a few other things like Archive.org, rsync.net, and Tahoe-Laffs can be made to work too. One of my goals is to add as many cloud storage options to git-annex as possible.
Box.com came to my attention because they currently have a promotion that provides 50 gigabytes of free "lifetime" service. Which is a nice amount of cloud storage to have for free. I decided that I didn't want to spend more than 4 hours of my time to make git-annex use it though. (I probably have spent a week on the S3 support by contrast.)
So, this is a case study in quickly adding support for one cloud storage provider to git-annex.
- First, I had to sign up to box.com. Their promotion requires an android phone be used to get the 50 gigabytres. This wasted about an hour getting my unused phone dusted off etc. This also includes time spent researching ways to access box.com's storage, including reading their API documentation. I found it has a WebDAV interface.
- Sadly, there is not yet a native WebDAV library for haskell. This is a shame, because it would make the implementation better. But, I'm confident someone will eventually write one. My experience with haskell libraries for other web APIs (S3, GitHub) is that it's an excellent language to write them in, the code tends to be very simple, concise and clear. But I can't do it in 4 hours. So for now, the workaround is to use a WebDAV mounting tool. I picked davfs2 as it was the first one I got to work with box.com's slightly broken WebDAV. 2 hours spent now.
- With box.com mounted, I was neary done; git-annex's
directory
special remote can use the mount point. But there was a catch:
box.com only allows up to 100 mb large files. I spent 1 hour or so
adding support to the directory special remote for chunking files
into a user-specified size.
This was a fairly complex problem -- the existing code had a ByteString that when accessed lazily read the whole large file (from disk or from gpg, depending), and just calledwriteFile
on it.
I needed to still consume it lazily to avoid reading the whole file into memory, but write out chunks. This gets a bit into haskell's ByteString internals, but they're very well suited to this kind of thing, and so after 15 minutes familiarizing myself with the data structures, it was actually fairly easy to write the code. patch - I spent my last hour testing and tuning the box.com special remote. Using davfs2 as a quick fix caused some technical debt that I had to make up for. In particular, the chunked filename retrieval code had to make sure not to open every chunk at once, because that makes davfs2 try to cache them all, instead of streaming one at a time. patch
- Not counted toward my 4 hour limit is the ... er ... 4 hours I spent last night adding a progress bar to the directory special remote. A progress display while transferring the files makes using box.com as a special remote much nicer, but also makes using my phone's SD card as a special remote much nicer! This is why I'm a poor consultant -- when faced with something generic and generally useful like this, I have difficulty billing for it.
The end result is that there are detailed instructions for using box.com as a special remote.
And it seems to work quite well now. I just set up my production box.com special remote. All content written to it is gpg encrypted, and various of my computers have access to it, each using their own gpg key to decrypt the files uploaded by the others. (git-annex's encryption feature makes this work really well!)
So..
There is a DropBox API for haskell.
But as I'm not a customer, the 2 gb free account hardly makes it worth
my while to make git-annex use it.
Would someone like to fund my time to add a dropbox special remote to git-annex?
Thank you, those steps for mounting box.com are very useful. I'd signed up for the 50 GB of space with Android, but hadn't done anything with it. The lack of automatic desktop sync (compared to dropbox) seemed a problem, but if it can be mounted, that's pretty useful.
By the way, any idea why davfs2 shows a strange number for the disk space used & available? I'd expected something like 1 MB used, 50 GB free, since I haven't put anything there yet, but got this result:
And if it helps, you can get to 7 or 8 GB of free dropbox space following the steps here: http://www.ozbargain.com.au/node/62910?page=1#comment-750513 , BUT it requires a Windows machine, and a bit of stuffing around. For some reason it wouldn't work in a VirtualBox Windows VM either, but did work on a real Windows machine. Dropbox tend to have little quests and challenges and stuff, each one of which adds a little to your free space, but only a little as they want people to pay them for more space.
Would be nice if Ubuntu would pull this version of git-annex into their upcoming release, because I'd quite like to give it a try. Looks like they tend to take a version from a few months before they release though: http://packages.ubuntu.com/search?keywords=git-annex - so probably out of luck - oh well, I'll give it a go in the 12.10 release.
I have the same df output. I have not yet tried uploading more than 13G to see what happens.
I've been disappointed at the versions of git-annex that end up in Ubuntu. A tiny bit of coordination could improve things a lot. You can probably just use http://backports.debian.org/ with Ubuntu; it always has the same version that's in Debian testing.
I only got my box.com account a couple of weeks ago and asked myself the very question "Can I connect to it without the web UI and preferably using command line?". The same goes for dropbox, but they doen't seem to be that bothered. In their "votebox" "WebDAV access" has over 40000 votes and it's the 7th most requested feature (by vote anyway)! It's just after "Windows mobile support" and "Email files to Dropbox", both of which (and many more which have less votes) can be replaced by WebDAV access so if you sum them up it would be somewhere in the first 3-4!
Anyway, I diverged slightly. If not with git-annex, I'll definitely use it as a "cloud" - still can't get used to that word ;^) - backup.
Thanks for a good write-up (as usual) Joey.
Hi Joey A bit late the party, but I'm really enthused about what you've done with git-annex!
I've migrated my main machine to use only open-source software, but am still locked into Dropbox because that's what everyone in my company uses. Since Dropbox insists on installing proprietary software in its client, that's a no-go for me. (My experiments with OwnCloud and a proxy machine runnnig OC and DB failed because Dropbox was too unreliable at handling conflicts, and OC was a CPU hog.)
I bet you've seen https://github.com/TobiasTheViking/dropboxannex by now. (Surprised it wasn't listed in the walkthrough.) I'm about to give it a spin.
If you're request for a Dropbox account or sponsorship still stands, please let me know. I think it'd be a great stepping stone for a lot of locked in Dropbox users to be able leave the Dark Side.