ikiwiki dependency types

Like make(1), ikiwiki knows about dependencies between the files that it is building, and will rebuild a file if its dependencies change.

I've always known that it was possible to greatly reduce the amount of work ikiwiki does, if it gets smarter handling of different types of dependencies. It would be as if make knew that changes to comments inside header files did not change the resulting binary, and avoided rebuilding it for such a change -- but did rebuild the doxygen documentation. Except that, in ikiwiki, this can be done sanely, since it figures out the dependencies by examing the wiki's files, not by using a Makefile.

Also, there were some nasty bugs with certian complex dependencies not being handled right. Fixing one of them, the transitive dependency problem, would require ikiwiki doing significantly more work. So fixing all this has been on my todo list for a while.

Though when I mentioned "transitive dependencies" at the ikiwiki BOF @ DebConf, eyes glazed over. :) And I expect I've lost most of my readers already here too.. Which is a pity, because I've just gotten to the fun stuff.

So, a few (how many?!) days ago I started my current insane hackfest, and began adding dependency types into ikiwiki. The first special type was a presence dependency -- a dependency that only cares if a page is created, or deleted, but does not care about changes to the page. Next type was a links dependency -- a dependency that only cares if the page changes what pages it links to.

That was great, except it often didn't work. For features like gathering "posts/*" together into a blog, or displaying a tag cloud of "tags/*", ikiwiki uses a PageSpec. This is a small, domain specific language, that can specify a set of pages with lots of control. For example, "(index or backlink(index)) and created_after(oldpage)".

The problem is that when using a PageSpec like the one above in a presence dependency, it doesn't only matter if pages are added/removed. Certian changes to "oldpage" or "index" can also change what it matches.

So now, I had this PageSpec DSL, that I needed to somehow analyise to find these influences. While thankfully not turing-complete, parsing it to extract the influences would still be annoying.

Except: Operator overloading to the rescue! The solution was this perl:

package IkiWiki::FailReason;

use overload (
        '""'    => sub { $_[0][0] },
        '0+'    => sub { 0 },
        '!'     => sub { bless $_[0], 'IkiWiki::SuccessReason'},
        '&'     => sub { $_[0]->merge_influences($_[1]); $_[0] },
        '|'     => sub { $_[1]->merge_influences($_[0]); $_[1] },
        fallback => 1,
);

package IkiWiki::SuccessReason;

use overload (
        '""'    => sub { $_[0][0] },
        '0+'    => sub { 1 },
        '!'     => sub { bless $_[0], 'IkiWiki::FailReason'},
        '&'     => sub { $_[1]->merge_influences($_[0]); $_[1] },
        '|'     => sub { $_[0]->merge_influences($_[1]); $_[0] },
        fallback => 1,
);

Ikiwiki already dealt with pagespecs by translating them to perl (which is then compiled (yes, perl can), and happily memoized). A pagespec is translated to something like this:

(match_glob("index") | match_backlink("index")) & match_created_after("oldpage")

The match_* functions return IkiWiki::SuccessReason or IkiWiki::FailReason objects. These objects are special in a lot of ways; they act like boolean values most of the time, but if viewed as a string, you get a user-understandable message saying why the pagespec didn't match. And when ANDed or ORed together, they build up a list of influences, that can be accessed at the end to get all the influences for the whole pagespec.

(Why am I using what's supposed to be bitwise operations & and |, rather than "and" and "or"? Because perl doesn't seem to consider the latter to be operators. Odd.)

Anyway, this stuff should be available in an ikiwiki release just as soon as I've finished making it smoking fast.

Posted
couchdb

Couchdb came onto my radar since distributed stuff is interesting to me these days. But most of what was being written about it put me off, since it seemed to be very web-oriented, with javascript and html and stuff stored in the database, served right out of it to web browsers in an AJAXy mess.

Also, it's a database. I decided a long, long time ago not to mess with traditional databases. (They're great, they're just not great for me. Said the guy leaving after 5 years in the coal mines.)

Then I saw Damien Katz's talk about how he gave up everything to go off and create couchdb. Was very inspirational. Seemed it must be worth another look, with that story behind it.

Now I'm reading the draft O'Rielly book, like some things, as expected don't like others[1], and am not sure what to think overall (plus still have half the book to get through yet), but it has spurred some early thoughts:

... vs DVCS

Couchdb is very unlike a distributed VCS, and yet it's moved from traditional database country much closer to VCS land. It's document oriented, not normalized; the data stored in it has significant structure, but is also in a sense freeform. It doesn't necessarily preserve all history, but it does support multiple branches, merging, and conflict resolution.

Oddly, the thing I dislike most about it is possibly its biggest strength compared to a VCS, and that is that code is stored in the database alongside the data. That means that changes to the data can trigger processing, so it is mapped, reduced, views are updated, etc, on demand. This is done using code that is included in the database, and so is always available, and runs in an environment couchdb provides -- so replicating the database automatically deploys it.

Compare with a VCS, where anything that is triggered by changes to the data is tacked onto the side in hooks, has to be manually set up, and so is poorly integrated overall.

Basically, what I've been doing with ikiwiki is adding some smarts about handling a particular kind of data, on top of the VCS. But this is done via a few narrow hooks; cloning the VCS repository does not get you a wiki set up and ready to go.

There are good reasons why cloning a VCS repository does not clone the hooks associated with it. The idea of doing so seems insane; how could you trust those hooks? How could they work when cloned to another environment? And so that's Never Been Done[2]. But with couchdb's example, this is looking to me like a blind spot, that has probably stunted the range of things VCSs are used for.

If you feel, like I do, that it's great we have these amazing distributed VCSs, with so many advanced capabilities, but a shame that they're only used by software developers, then that is an exciting thought.


[1] Javascript? Mixed all in a database with data it runs on? Imperative code that's supposed to be side-effect free? (I assume the Haskell guys have already been all over that.) Code stored without real version control? Still having a hard time with this. :)

[2] I hope someone will give a counterexample of a VCS that does so in the comments?

Posted