debugging musings

Debugging a computer program is such an interesting activity because it's not really a matter of fixing a program. It's a matter of fixing your own understanding to the point that the cause of the bug becomes obvious. So debugging means constantly challenging your assumptions, constantly looking for the overlooked insignificant thing that turns out to be crucial.

I mostly debug by print statements. I don't like it much, but it works. I add some print statements around the problem area and see if what's going on matches what the model in my head thinks should be going on. If it doesn't, I try to fix the model, and if it does, I focus the print statements in tighter and tighter, deeper and deeper, until I finally reach the bug. It's nasty and ah-hoc, but it works really well, mostly. And I know I'm in good company doing it.

But that method can fail; I myself can end up caught in a loop. Like today, tracking down an obscure bug in Branchable's auditing subsystem, when I found myself at what was apparantly the point of a bug: Function A was supposed to call function B, but B never seemed to be called. So surely A was buggy, and I instrumented it to bits with print statements and stuff, and started wondering if the libraries it relied on were somehow breaking it in non-obvious ways before it could call B, or maybe there was a bug in the runtime language that was breaking the call to B.

Going deeper and deeper didn't help, because the thing I was overlooking was up at the top level -- one of the first print statements I inserted showed that all this code was running twice. Why? Didn't seem relevant to the bug so I ignored it. Of course what I was ignoring was a symptom of the real problem: The first run was by another subsystem, that happened to redefine function B temporarily..

Actually, one of the nice things about Haskell is that print statements are not very helpful in debugging pure code. Pure code tends to get loaded up in ghci and tested interactively, and the bug is then clear. Or a quickcheck test case gets written with an invariant that the pure code should satisfy. I spent a whole day a few weeks ago writing an inverse form of a buggy function in git-annex just so I could feed them both into quickcheck and automate debugging. (Most of that time was spent wrestling with unicode decomposition .. ugh.) Being used to getting down in the buggy code with print statements, that at the time felt rather like a waste of time unrelated to getting on with making my program work, but it was very much worth it since I got my program working 100% in even crazy edge cases, and got a test suite for free, too.

Most of my debugging of Haskell code so far has not involved bugs in pure code. The most memorable bug was actually a bug in ghc's IO manager, which I'd have never tracked down without Josh's help. The problem with finding bugs in the runtime environment or compiler is that after you've found enough over the years, you're more likely to go down false paths distrusting runtime environments, like I did with the bug today. The beginner's assumption is that the runtime environment just works, but I by now may have over-challenged that assumption. :)