Three thousand lines of code is not a huge program, but it is enough to get a pretty good feel for a language. Now that I've completed my first real Haskell program I feel that I've gotten over several of the humps in the learning curve and am starting to get a good feel for it.
Actually, I've written closer to five thousand lines, since there were
several big refactorings. One was when I stopped manually threading my
program state around and added a StateT
monad. I did know from the
beginning I would need one, but it seemed easier and a better learning
exercise to let the program start out with a vesigial tail and gills
before growing up into a modern Haskell program. (I suppose it's still
written in baby-Haskell, really..)
Another refactoring came when I realized I needed to use a custom data
type, not String
, to represent keys. That was a great experience in
type-based refactoring. Being able to keep typing ':make' and landing
on the next bit of code that needed fixing was great, and simply adding
that one type exposed several non-obvious bugs.
I found myself writing code that is much more solid and reusable than normally comes easily. And yet it's also very malleable. Actually, pulling out better data types and abstractions can get a bit addictive.
When I realized that I had a similar three-stage control flow being used for each of git-annex's subcommands, and factored that control flow out into a function that used the 3 data types below, I felt I'd gone down that rabbit hole perhaps far enough for now.
type SubCmdStart = String -> Annex (Maybe SubCmdPerform)
type SubCmdPerform = Annex (Maybe SubCmdCleanup)
type SubCmdCleanup = Annex Bool
(That will allow for some nice parallelism later though, and removed dozens of lines of code, so was worth it.)
Since git-annex is a very Real World Haskell type program, there is a lot of impure code in it. I could probably do better at factoring out more pure code. I count 117 impure functions, and only 37 pure.
Anyhow, from my perspective of a long-time perl programmer, some other random impressions..
ghc --make
is handy, but every time it spits out a new 13 mb executable I can feel my laptop's SSD groan!- It was surpisingly easy to get into nasty situations with recursive dependencies between the 19 haskell modules I wrote. Sometimes solving them was really messy. I lost hours to this. More time than I've lost to the problem in all other languages combined over 15 years. It's not clear to me if it was due to the overall design of my program, or if Haskell's types tend to encourage this problem. Or if there's some simple "please let me have recursive dependencies" switch to ghc that I missed..
- I'm used to being able to use
man
to get at mutiple books worth of detailed documentation for perl, and work easily offline or with limited bandwidth. With Haskell, I spend much more time searching online for documentation than I an comfortable with (although Hoogle is pretty neat). And the haddock-produced documentation is often pretty sketchy. The saving grace is that the source to any library function is a click away, and tends to be very readable. - I'm used to being able to use pretty much any Unix syscall by name from
perl:
mkdir
,chmod
,rename
, etc. In Haskell, there is a Windows smell to the names, likecreateDirectoryIfMissing
andsetPermissions
. And there are pointless distinctions likerenameFile
vsrenameDirectory
. These long names are not memorable and I have to look them up every time. Most of POSIX is available, but it's scattered amoung many disparate libraries, and I can't find an interface for sysconf(3) at all. There is a certian temptation, that I am so far resisting, to make a library for C/perl refugees that exports the sane Unix names for everything. - Anything involving the IO monad, or probably most monads,
has a certian level of syntactic clumsiness about it. Compare:
if ($flag{foo} && length $l = <>) {
vsfoo <- getFlag "foo" l <- getLine if (foo && not $ null l) then do
When writing lots of impure code, that got old, and while I could useifM
, or make up some other similar thing, its syntax would also be somewhat clumsy. - The fixity levels for a lot of stuff seems a bit off. I too often
found myself writing
error $ "foo: " ++ (show bar)
orreturn $ Just $ ...
(Still a lot better than Scheme thanks to$
!) - I've leveled up a couple times now, but this particular video game seems to have more levels going up and up, forever. Can't even see the top from here!
Reading your post and browsing your Haskell code, a few things jumped out at me that might help.
It takes a lot of getting used to for a C or scripting language hacker, but functions really do bind more tightly than operators, so you can write things like this:
Or:
You don't need to put parentheses around a function application.
Also, language constructs like
if
have higher syntactic "precedence" than anything else, so you can write:You might also find the
when
function useful; it works likeif
in a monad, but it has noelse
, and assumes you don't want a result; so, for instance, intryRun'
:(Note the lack of an
else return ()
.)And since you write Perl, you'll probably appreciate the corresponding
unless
.In the GitRepo module, you have two different constructors for Repo, but they share three out of four fields, and they only differ depending on whether you have a FilePath or URI. You might consider changing that to use a single constructor and have one of the fields use a data type that itself can contain either a FilePath or a URI. That would simplify
workTree
,repoFromPath
, andrepoFromUrl
, among many other functions.You really want to use pattern matching rather than conditionals. In general, you rarely want
if
; think of pattern matching as your primary control structure, andif
as a special case. You can write things like:Or (assuming you unify the two Repo constructors):
Or:
(Pattern matching has become such a natural control structure for me that I had to resist the impulse to write the one-line-longer-but-more-pattern-matchy
Just "true" -> ...
. Also, I think Git has a default for deciding about bare/non-bare repositories, based on the path ending in .git or not; you might consider using that default to avoiderror
.)You should probably also have far fewer calls to
error
, but that's a different and somewhat more difficult problem. At a minimum, you might consider using something like the ErrorT monad rather than thrown and caught exceptions. You also almost never want pure code generating exceptions of any kind.error $ foo ++ bar
with the$
; that same fixity also explains why you can writeshow foo ++ "str"
with no parentheses.Josh, thanks for the comments. I know I'm writing baby haskell -- had not thought about using pattern matching to dig inside record types, and will take that on board. I use ifs when I'm thinking procedurally. :) I suspect I should also use guards more (well, at al), but I've not internalized that syntax yet either.
I had thought about unifying the two Repo types, and could very well not have made the best decision there, but it seemed that adding a new type like RepoLocation = Url URI | Dir FilePath would involve lots more tedious pulling apart the nested data types in the places that need to get at those values. With @ pattern matching inside record types, it's not too bad.
Managing exceptions seems like one of the bigger cans of worms in Haskell.
(
when
is nice to know -- pity about all that punctuation needed though..)BTW, Git's use of dir.git for bare repos is mostly a heuristic or UI abbreviation and not to be trusted.
Regarding the bare repo detection, even if git's own detection only works as a heuristic, making git-annex match git's behavior seems preferable to simply erroring out if you don't have the configuration option.
And yes, exceptions do indeed represent quite a can of worms. I personally fall in the camp that thinks Haskell ought to require declaring possible thrown exceptions as part of types, or not have them at all. They feel like a dynamic sore thumb sticking out of an otherwise static language, and they break my usual heuristic of "it compiles, it must be correct". :)
That said, I won't necessarily argue that you need to get rid of your use of exceptions entirely. And obviously you have to deal with the exceptions generated by other code regardless. But you might consider adding your own explicit exception type, and then only catching the exceptions you know how to deal with. And I'd highly recommend not catching the exception "error" generates, and not using that for expected failures. Most Haskell code I've seen reserves "error" for "program logic error that I couldn't detect until runtime" (such as head [] or fromJust Nothing).