Today wiki.debian.org was briefly running on a full disk, and this MoinMoin based wiki failed pretty spectacularly:

  • Every page view failed with a huge python backtrace.
  • Saving a page that was being edited before it ran out of disk lost the edit and possibly corrupted the page that was being edited. Not 100% sure as I wasn't the one who experienced that.

We're used to some unix programs not dealing with being out of disk space very well. But IMHO wikis should be in the same class as package management systems, revision control systems, or databases: running out of disk should be something they handle gracefully without data loss and with graceful degredation when possible.

Course I wrote a wiki engine too, and it didn't fare much better. Possibly worse, although I tested it much harder than MoinMoin:

  • Viewing pages? No problem! (They're static..)
  • Its internal state files could be truncated if the disk went full at exactly the wrong time. This would need a rebuild of the wiki to fix.
  • It could write truncated or empty html files if changes were made while the disk was full. Again a wiki rebuild would fix this.
  • It could save truncated source files if the user edited a page, and it might be possible (though unlikely) for those to be committed to the backend revision control system. If so you'd have to revert back to an old version during recovery; no data should be lost unless the backend revision control system stinks.
  • The user database could delete itself if a user tried to get an account while the disk was full (I'd like to thank perl's Storable module for this admirable behavior.)

Of course I've fixed all these problems, and it seems quite robust now on randomly failing disks. If a page edit fails, it will go as far as to bring you back to the edit form and let you try again.

I'm interested in how well other wiki engines stack up.. Also, if anyone has some hints for good ways to write regression tests involving full disks, I haven't found an approach I like yet.

discussion