Apologies in advance for writing a long blog post about the dull and specialised subject of double-entry accounting from the Unix tools perspective, that ends up talking about Monads to boot. I can't believe I'm about to write such a thing, but I'm finding it an oddly interesting subject.

double-entry accounting

I've known of, though probably not really understood double entry accounting for a while, thanks to GnuCash. I think GnuCash did something rather special in making such a subject approachable to the layman, and I've been happy to recommend GnuCash to friends. I was stoked to find a chapter in my sister Anna's new book that happily and plausibly suggests readers use GnuCash.

But for my personal use, GnuCash is clunky. I mean, I'm a programmer, but I can't bring any of my abilities to bear on it, without a deep dive into the code. The file format is opaque (and a real pain to keep checked into git with all the log files); the user interface is often confusing, but there's no benefit to its learning curve, it never seems to get better than manually entering data into its ledger, or perhaps importing data from a bank. I've never found the reports very useful.

I've got perhaps a year of data in GnuCash, but it's fragmented and incomplete and not something I've been able to motivate myself to keep up with. So I have less financial data than I'd like. I'm hoping ledger will change that.

ledger

I've known about ledger for a while, at least since This Linux Weekly News article. It's a quintessential Unix tool, that simply processes text files. The genius of it is the simplicity of the file format, that gets the essence and full power of double entry bookeeping down to something that approaches a meme. Once you get the file format stuck in your head, you're done for.

2004/05/27 Book Store
      Expenses:Books                 $20.00
      Liabilities:Visa

starting to use hledger

Now as a Haskell guy, I was immediately drawn to the Haskell clone, hledger. It's nice that there are two (mostly) compatable implementations of ledger too. So from here on, I'll be talking about hledger.

I sat down and built a hledger setup for myself the other day. I started by converting the GnuCash books I have been keeping up-to-date, for a small side business (a rental property). It quickly turned into something like programming, in the best way, as I used features like:

  • Include directives, so I can keep my business data in its own file, while pulling it into my main one.
  • Simple refactorings, like putting "Y 2012" at the top, so I don't have to write the year in each transaction.
  • Account aliases, so I can just type "rent", rather than "income:rental" and "repairs:contractor" rather than "expenses:home repair:contractor"
  • All the power of my favorite editor.

a modern unix program

While I've been diving into hledger, I've been drawing all kinds of parallels between it and other modern Unix-friendly programs I use lately. I think we've gone over a bit of a watershed recently. Unix tools used to be either very simple and crude (though often quite useful), or really baroque and complex with far too many options (often due to 10-30 years of evolution). Or they were a graphical user interface, like GnuCash, and completely divorced from Unix traditions.

The new unix programs have some commonalities...

  • They're a single command, with subcommands. This keeps the complexity of doing any one thing down, and provides many opportunities for traditional unix tools philosophy, without locking the entire program into being a single-purpose unix tool.

    hledger's subcommands range from querying and reports, to pipable print, to a interactive interface.

  • They have a simple but powerful idea at their core, that can be expressed with a phrase like "double entry accounting is a simple file format" (ledger), or "files, trees, commits" (git).

    Building a tool on a central idea is something I strive to do myself. So the way ledger manages it is particularly interesting to me.

  • They are not afraid to target users who have a rather special kind of literacy (probably the literacy you need to have gotten to here in this post). And reward them with a lot of power.

    Ledger avoids a lot of the often confusing terminology around accounting, and assumes a mind primed with the Unix tools philosophy.

  • If there's a GUI, it's probably web based. There's little trust in traditional toolkits having staying power, and the web is where the momentum is. The GUI is not the main focus, but does offer special features of its own.

    hledger's web UI completely parallels what I've doing with the git-annex webapp, right down to being implemented using Yesod -- which really needs to be improved to use some methods I've developed to make it easier to make webapps that integrate with the desktop and are more secure, if there are going to be a lot more programs like this using it.

importing data

After manually converting my GnuCash data, I imported all my PayPal history into hledger. And happily it calculates the same balance Paypal does. It also tells me I've paid PayPal $180 in transaction fees over the years, which is something PayPal certianly doesn't tell you on their website. (However, my current import from PayPal's CSV files is a hackish, and only handles USD currency, so I miss some currency conversion fees.)

I also imported my Amazon Payments history, which includes all the Kickstarter transactions. I almost got this to balance, but hledger and Amazon disagree about how many hundreths of a cent remain in my account. Still, pretty good, and I know how much I paid Amazon in fees for my kickstarter, and how much was kicked back to Kickstarter as well. (Look for a cost breakdown in some future blog post.)

At this point, hledger stats says I have 3700 transactions on file, which is not bad for what was just a few hours work yesterday.

One problem is hledger's gotten a little slow with this many transactions. It takes 5 seconds to get a balance. The ledger program, written in C, is reportedly much faster. hledger recently had a O(n^2) slowdown fixed, which makes me think it's probably only starting to be optimised. With Haskell code, you can get lucky and get near C (language, not speed of light) performace without doing much, or less lucky and get not much better than python performance until you dive into optimising. So there's hope.

harnessing haskell

If there's one place hledger misses out on being a state of the art modern Unix program, it's in the rules files that are used to drive CSV imports. I found these really hard to use; the manual notes that "rules file parse errors are not the greatest"; and it's just really inflexible. I think the state of the art would be to use a Domain Specific Language here.

For both my Amazon and PayPal imports I had CVS data something like:

date, description, amount, fees, gross

I want to take the feeds into account, and make a split transaction, like this:

date description
    assets:accounts:PayPal             $9.90
    expenses:transaction fees:PayPal   $0.10
    income:misc:PayPal                 -$10.00

This does not seem possible with the rules file. I also wanted to combine multiple CVS lines, to do with currency conversions, into a single transaction, and couldn't.

The problem is that the rules file is an ad-hoc format, not a fully programmable one. If instead, hledger's rules files were compiled into standalone haskell programs that performed the import, arbitrarily complicated conversions like the above could be done.

So, I'm thinking about something like this for a DSL.. I doubt I'll get much further than this, since I have a hacked together multi-pass importer that meets my basic needs. Still, this would be nice, and being able to think about adding thing kind of thing to a program cleanly is one of the reasons I reach for the Haskell version when possible these days.

First, here's part of one of my two paypal import rules files (the other one extracts the transaction fees):

amount-field 7
date-field 0
description-field %(3) - %(4) - %(11)
base-account assets:accounts:PayPal

Bank Account
assets:accounts:checking

.*
expenses:misc:PayPal

That fills out the basic fields, and makes things with "Bank Account" in their description be categorised as bank transfers.

Here's how it'd look as Haskell, carefully written to avoid the $ operator that's more than a little confusing in this context. :)

main :: IO ()
main = convert paypalConveter

paypalConverter :: [CSVLine] -> [Maybe Transaction]
paypalConverter = map convert
  where
    convert = do
        setAmount =<< field 7
        setDate =<< field 0
        setDescription =<< combine " - " [field 3, field 4, field 11]
        defaultAccounts
            "assets:accounts:PayPal" ==> "expenses:misc:PayPal"
        inDescription "Bank Account" ?
            "assets:accounts:PayPal" ==> "assets:accounts:checking"

That seems like something non-haskell people could get their heads around, especially if they didn't need to write the boilerplate function definitions and types at the top. But I may be biased. :)

Internally, this seems to be using a combination Reader and Writer monad that can get at fields from a CSV line and build up a Transaction. But I really just made up a simple DSL as I went along and thew in enough syntax to make it seem practical to implement. :)

Of course, a Haskell programmer can skip the monads entirely, or use others they prefer. And could do arbitrarily complicated stuff during imports, including building split transactions, and combining together multiple related CVS lines into a single transaction.

Double-entry accounting

Dull and specialized? Small business ain't dull, and with fewer employers wanting to hire, is getting less specialized. To the detriment of my peers (I'm a professional bookkeeper), more small business owners are doing their own books. Here in Maine, and I believe the rest of USA, the only real game in town is Quickbooks Pro. People complain a lot about Microsoft's lock-in, but that's nothing compared to Intuit's stranglehold on American small business.

GnuCash is a personal finance manager that recently got the double-entry religion. It's clunky, alright. SQLedger is a full-fledged accounting system, and is what I plan to use for my next business. Even so, neither enjoy any popularity; you actually have to learn accounting concepts to work them correctly. I very much hope that some professional accountants realize how the lack of good open accounting software (or at least it's popularity) is hurting our economy and sponsor one.

Good luck with hledger. Sorry, but it looks to me about as much fun as COBOL (which is also awesome at accounting).

Comment by jimcooncat
GnuCash log files

GnuCash is clunky ... The file format is opaque (and a real pain to keep checked into git with all the log files)

The .gnucash and .log files are just backups, so they don't need to be checked into version control.

Comment by Jim
comments

Thanks for this thoughtful review! It was a pleasure to read.

I do believe there's a lot of room for more optimisation in hledger. It's not a small task, I take a swipe at it now and then when I can or need to. (If anyone else wants to take a look, there are some Makefile rules like quickprof, quickheap and quickbench.)

As you rightly point out, hledger's CSV conversion is currently not very powerful. As in hledger generally, being usable by non-programmers and "just working" as documented are high priorities. Currently, I think compilation of custom haskell code will not meet those goals, and that a better textual DSL can cover 90% of needs more cheaply and reliably (I have some ideas in mind). Happy to be proven wrong though. An additional EDSL like you describe for power users would be cool. One way I have thought of easily interfacing with custom haskell code is to extend the plugin system so that eg executables in the path named hledger-read-csv-*[.hs] would be applied in this situation.

Comment by Simon
PS
I'm very keen to investigate the discrepancy in hledger/Amazon balance!
Comment by Simon
no sql ledger, please
@jimcooncat: I have been using SQL-Ledger for years, and it's really hard to wane from it, because you need to re-construct the accounting info in a different package and get your existing data converted. But the code, also that of it's descendants, is a horrible mess, unreliable, and fraught with security problems. I can only recommend staying clear of that. Instead, I recommend you look into Tryon or it's ancestor, Open-ERP.
Comment by johndoe1234567 [myopenid.com]
comment 6

hledger++ I've been using it for a year and a half now and it's the only thing that gets me to track my financial data.

I'm really looking forward to improvements in the CSV importer Simon!

Comment by Christine