CVS homedir, or keeping your life in CVS

This article is provided for historical reasons, but I use subversion (or newer stuff, like git) now. Svnhome explains how.

I keep my life in a CVS repository. For the past two years, every file I've created and worked on, every email I've sent or received, and every config file I've tweaked have all been checked into my CVS archive. When I tell people about this, they invariably respond, "you're crazy!".

After all, CVS is meant for managing discrete bodies of code, like free software programs that are worked on and available to a lot of people, or in-house projects that are collaboratively developed by several employees. CVS has a reputation of being a pain to deal with, and it has a lot of crufty bits that regularly drive users up the wall, like its mistreatment of directories. Why inflict the pain of CVS on yourself if you don't have to? Why do it on such a scale that it affects nearly everything you do with your computer?

I get three major benefits from keeping my whole home directory in CVS:

home directory replication
history
distributed backups

The first of these is what originally drove me to CVS for my whole home directory. At the time, I had a home desktop machine, two laptops, and a desktop machine at work. Rounding this out were perhaps twenty remote accounts on various systems around the world, and many systems around the workplace that I might randomly find myself logging in to. I used all these accounts for working on the same projects, and was already using CVS for those projects.

I'm a conservative guy when it comes to my computing environment (I've used the same wallpaper image for the past 5 years..), and at the same time, I'm always making lots of little tweaks to improve things. And whenever I got into work and something wasn't just like I had tweaked it the night before, I'd feel a jarring disconnect, and annoyedly copy over whatever the change was. When I sat down at some other system at work, to burn a CD, perhaps, and found a bare bash shell instead of the heavily customized environment I've built up over the past ten years, it was even worse. The plethora of environments, each imperfectly customized to my needs to varying degrees, was really getting on my nerves. And so one day I cracked, and sat down and began to feed my whole home directory into CVS.

And it worked astonishingly well. After a few weeks of tweaking and importing I had everything working, and began developing some new habits. Every morning (erm, afternoon) when I came into work, I'd 'cvs up' while I read the morning mail. In the evening, I'd 'cvs commit' and then update my laptop for the trip home. When I got home, I synced up again, and I could dive right back in to whatever I'd been doing at work, and keep on rolling until late at night, when I committed, went to bed, and began the cycle all over again. As for the systems I used less frequently, like the CD burner machine, I'd just update when I got annoyed at them for being a trifle out of date.

It only took a few more weeks before the advantage of having a history of everything I'd done began to show up. It wasn't a real surprise, since having a history of past version of a project is one of the reasons to use CVS in the first place, but it's very cool to have it suddenly apply to every file you own. When I broke my .zshrc, or .procmailrc, I could roll back to the previous day's, or look back and see when I made the change and why. It's very handy to be able to run "cvs diff" on your kernel config file and see how "make xconfig" changed it. It's great to be able to recover files you deleted, or delete files because they're not relevant and know you've not really lost them at all. For the amateur historians among us, it's very cool to be able to check out ones system as it looked 1 full year ago and poke around and discover how everything has evolved over time.

The final major benefit took some time to become clear. Linus Torvalds once said, "Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it". I'm not a real enough man to upload my confidential documents to ftp.kernel.org though, so I'd been wimping along with backups to tape and CD and so on. But then it hit me. Take one crucial file, like my .zshrc, or sent-mail archive. I had a copy of that file on my work machine, and on my home machine, and on my laptop, and several other copies on other accounts. There was another copy encoded in my cvs repository too.

I'm told that the best backups are done without effort -- so you actually do them -- and are widely scattered among many machines and a lot of area -- so a local disaster doesn't knock them out -- and are tested on a regular basis -- to make sure the backup works. I was doing all of these things, as a mere side effect of keeping it all in CVS. Then I sobered up, and remembered that a dead CVS repository would be a really, really bad thing, and kept those wimpy backups to CD going. But the automatic distributed backups are what keep me sleeping quietly at night. Later, when I left that job, the last thing I did on my work desktop machine was: "cvs commit ; sudo rm -rf /". And I didn't worry a bit; my life was still there, secure in CVS.

A full checkout of my home directory with all the trimmings often runs about 4 GB in size. A lot of that will be temporary trees in tmp/, and rsynced ogg vorbis files (so far, I have not found the disk space to check all of them into cvs). My cvs repository currently uses less than 1 GB of space, though it is steadily growing in size. I keep some 13 thousand files in CVS, and so a full cvs update of my home directory is a sight to see, and takes a while.

These days I'm often stuck behind a dialup connection, and I mostly just use one laptop, and so I might go days between cvs updates. Other better-connected systems have automatic cvs updates done via cron each day. I cvs commit whenever I want to make a backup of where I am in a file, or when I am at the point of releasing something. And I still also do a full commit of my home directory every day or so. I confess that some of my cvs commit messages are less than informative -- "foo" has been used far too many times on some classes of files. I even do some automatic cvs commits; for example, my mailbox archives are committed by a daily cron job.

There are other benefits of course. I attending many trade shows, and other events that require that I sit down at some computer just out of the box, use it for an hour or a day, and never see it again. I can check out the core of my cvs home directory in about 5 minutes, and after that it is just as comfortable as if I'd ssh'd home and was doing everything there, and much less laggy. I even get my whole desktop set up in that 5 minutes. In a chaotic tradeshow environment, there is nothing more reassuring than having your familiar computer setup at your fingertips as you demo things to the hordes of visitors.

Keeping your home directory in CVS is not all fun though. Anyone who's used CVS in a large project has probably had to resolve conflicts engendered by two people modifying the same file. At least you can curse the other guy who committed the changes first while you deal with this annoying task. Most of you have probably not had to resolve conflicts between the file you modified at home and at work, and cursing at yourself is less satisfying. This happened more often than you'd think, too.

And then there are CVS's famous annoyances: Poor handling of directories and binary files. Nearly nonexistent handling of permissions, which is not a big deal in most projects, but becomes important when you have a home directory with some public and some some private files and directories in it. Slow, bloated protocol, slowed even more by the necessity of piping it all over ssh. The pain of trying to move a file that is already in CVS, or much worse, a while directory tree, again hits you especially hard when you're using CVS for the whole home directory. Those damn "CVS" directories cluttering up everything. I've developed means of coping with all of these, to varying degrees, but like many of us, I'm hoping for a better replacement one day (and dreading the transition to it!).

Perhaps it's time that I get down to the details of how I organize my home directory in CVS. I've always managed my home directory with an iron hand, and CVS has just exacerbated this tendency. Let's look at the top level:

joey@silk:~>ls
CVS/  bin/  debian/  doc/  html/  lib/  mail/  src/  tmp/

Yes, that's it. Well, except for 100+ dot-files. Most people use their home directory as a scratch space for files they're working on, but instead I have a dedicated scratch directory, the tmp directory, which I clean out irregularly. In general, when I start a new file or project, I will be checking it into CVS soon, so I begin working on it in the appropriate directory. This document, for example, is starting its life in the html directory, and will be checked into CVS soon and live there forevermore, Of course, sometimes I goof up and then I have to resort to the usual tricks to move files in CVS. And so the first rule of cvs home directories: It pays to think before starting and get the right filename and location the first time. Don't be too impatient to check the file in.

CVS is a great way to ensure that you have a nice clean well-managed home directory. Every time I CVS update, it will helpfully complain to me about any files it doesn't know about. Of course I make heavy use of .cvsignore files in some directories (like tmp/).

If I go to another machine, the home directory looks pretty much the same, though various things might be missing.

joeyh@auric:~>ls
CVS/  bin/  tmp/

I use this machine for occasional specific shell purposes. I don't admin the system, so I don't want to put private files there. The result is a much truncated version of my home directory. It's perfectly usable for everything I normally do on that machine, and if I want to, say, work on this document there at some point, I can just type 'cvs co html' and a password and be on my way.

The way I make this partial checkouts system work is by using CVS modules and aliases. I have modules defined for each of the top-level directories, and for the home directory (dotfiles) itself. For example, the entry in my CVSROOT/modules file for the stripped down version of my home directory looks like this:

joeyh -u cvsfix -o cvsfix joey-cvs/home &bin

For more complete home directories, I use this intead:

joey -u cvsfix -o cvsfix joey-cvs/home &src &doc &debian &html &lib &.hide &bin &mail

Notice the .hide module -- that results in a ~/.hide directory, when I check it out. That directory is where I put the occasional private file that I don't want to appear in home directories -- like the one on auric -- that are on systems not administered by me. The files in .hide get hard-linked to their proper locations if .hide is checked out, so I can put confidential dot-files in there and only check those dot-files out on trusted systems. I also have, for example, my mozilla cookies file in .hide.

It's important to distinguish between such files which I need to put in .hide and entire private directories, like my mail directory. Yes, I keep my mail in CVS (except for just-arrived spooled mail (which I keep synced up with a neat little program called isync that is smarter about mail than CVS can be). But it's all in its own mail/ directory, so I can just omit checking that directory out to systems that I don't trust with my mail, or that I don't want to burden with hundreds of megabytes of mail archives.

While I'm discussing privacy issues, I should mention that I make some bits of my home directory completely open to the public. This includes a lot of free software in debian/ and src/, and some handy little programs in bin/. This is accomplished by permissions. I have to make sure that most directories in the repository (or at least the top-level directories like mail/) are mode 700, so only I can access them. Other top-level directories, like bin/ are opened up to mode 755, and this allows anonymous cvs access and browsing.

And this leads the a second rule of cvs home directories: Don't import $HOME in one big chunk; break it up into multiple modules. The structure of your repository need not mirror the structure of your actual home directory; modules can be checked out in different locations to move things around and control access on a per-module level. There's a layer of indirection there, and such layers always make things more flexible and more complex too.

Some of the projects I work on have their own cvs repositories that are unconnected to my big home directory repository. That's fine too; I simply check them out into logical places in my home directory tree as needed, and CVS can even be tweaked to recurse into those directories when updating or committing.

Another thing to notice in those lines from my modules file is the use of -u cvsfix to make the cvsfix program be run after cvs updates. That program does a lot of little things, including ensuring that permissions are correct, setting up the hardlinks to files in .hide, etc.

One last thing to mention is the issue of heterogeneous environments and CVS. Most of my accounts are on systems running varying versions of Debian Linux on a host of different architectures, but there are accounts on other distributions, on Solaris, and so forth. Trying to make the same dot-files work on everything can be interesting. My .zshrc file, for example, goes to great pains to detect things like GNU ls, deal with varying zsh versions, set up aliases to the "best" available editor and other commands, and so on. Other programs like .xinitrc check the host they're running on and behave slightly (or completely) differently. I've even at one point had a .procmailrc that filtered mail differently conditional on hostname, though the trick to doing that is lost somewhere in one of the innumerable versions stored in my repository. I've even resorted in a few places to files with names of the form filename.hostname -- cvsfix finds one matching the current host and links it to the filename. Branches are also a possibility, of course, but despite my heavy use of CVS, I still find some corners of it a black art..

Well I guess that's it. Thank you for reading this un-edited brain-dump. I'd like to hear from anyone else who keeps their home directory in CVS, especially if you have some tricks to share. And if you keep /etc in CVS, I'd love to talk to you. Now I'm off to commit this file..

Update: I've switched from cvs to subversion. Most of the above still applies to my new setup, and will be left unchanged until I write an update on living in subversion. I have corrected some links that pointed to my no longer present cvs repository to point to similar files and directories in subversion.

Joey Hess joey@kitenet.net

This article appeared in the September 2002 edition of Linux Journal.