I've finally started writing my first real program in Haskell. I've been learning Haskell since 2007. That is to say, I've been reading about it, over and over.
Progress learning has been annoyingly slow because I have not been able to commit to using it in a real program. Because when I'm writing a real program I want the program now, and I can code in other languages ten to one hundred times faster than I can currently code in Haskell. Nasty viscious cycle. Probably best to avoid getting really good at specific programming languages to avoid it, in hindsight.
Writing a real program in Haskell is important because I mostly learn by doing real things. Exercises don't work well for me. Up till now the only real thing I did in Haskell was write a 250 line xmonad config file in it. Which really helped a lot, but mostly only with beating some of its syntax into my head.
I have a few hundred lines of git-annex
done. I don't need this urgently
so I am willing to spend a week writing it instead of dashing off a perl
version tonight.
I seem to have had bad luck on the parts I tackled first, and run into some swampier parts of the Haskell platform. I hope these problems are behind me, they sure made the first 8 hours fun.
time
First thing I needed was a data structure with a time stamp. There are multiple modules to handle time, mostly seemingly incompatible, deprecated, and/or possibly broken.
Data.DateTime
seems to be the current choice, but I don't entirely trust
it. Check this weird thing out, it thinks that 0 seconds and 1 second after
the Unix epoch are the same second.
Data.DateTime> map (toSeconds . fromSeconds) [-1,0,1,2,100,1000,100000,100000,1000000,100000000,100000000,1000000000,10000000000]
[-2,0,0,1,99,999,99999,99999,999999,99999999,99999999,1000000000,10000000000]
Rounding error?
(Now it's not all bad... it does support dates right up until the heat death of the universe, on 32 bit even. No Y2038 bug here.)
locking
Huh, I read Real_World_Haskell twice and never noticed that it seems to omit POSIX file locking. Which in my Real World is a necessity.
Probably because it's so hard in Haskell, with multiple gotchas.
First there's the lazy IO problem, which means that if you explicitly close a file, code that read from it may not have really run yet. I ended up needing this scariness from System.IO.Strict, which is sadly not packaged in Debian or part of the Haskell Platform:
hGetContentsStrict h = hGetContents h >>= \s -> length s `seq` return s
Then there's the Haskell's interface for fcntl locking, which exposes rather more of fcntl locking than I like to think about, being spoiled by perl's 2-parameter interface to it.
waitToSetLock lockfd (ReadLock, AbsoluteSeek, 0, 0)
And then there's that lockfd
, which is not a regalar file handle,
but a fd number, which has to be obtained by calling handleToFd
.
And then there's the crazy interface that has handleToFd
close
the input handle! Which is problimatic if you were going to use the file
after you locked it..
The lazy I/O problem only applies to calls like hGetContents which return the entire file as a lazy string. Those calls do things like "unsafeInterleaveIO", which can lead to problems like those you encountered. However, other calls like hGetLine or hGetBuf will read strictly. Also, depending on what you want to do, you almost certainly want the functions in Data.ByteString instead, all of which read strictly. (Some of the functions in Data.ByteString.Lazy do too.)
Alternatively, you can use lazy IO, and then just make sure that you consume everything you want from the data before closing the file. For instance, note that if you transform the data and write the transformed data elsewhere with IO, the writing will finish before that IO action completes. Just fully consume the data while still inside the "withFile" or "withBinaryFile" call.
Regarding time, you actually want the Data.Time module, from the "time" package. It has a precision of picoseconds within a day, and uses Integer for days. (Data.DateTime appears to do some strange rounding math on top of that, which you clearly don't want.) If you also need the ability to convert between Data.Time's types and POSIX seconds-since-the-epoch timestamps, use Data.Time.Clock.POSIX.
So: git-annex? Sounds fun. What does it do?