Debian wheezy includes a bunch of excellent new Haskell libraries. I'm going to highlight one that should be interesting to non-Haskell developers, who may have struggled with writing non-buggy threaded programs in other languages: libghc-stm-dev
I had given up on most threaded programs before learning about Software Transactional Memory. Writing a correct threaded program, when multiple threads needed to modify the same state, needed careful uses of locking. In my experience, locking is almost never gotten right the first time.
A real life example I encountered is an app that displays a queue of files to be downloaded, and a list of files currently downloading. Starting a new download would go something like this:
startDownload = do
file <- getQueuedFile
startDownloadThread file
push file currentDownLoads
But there's a point in time in which another thread, that refreshes the display, could then see an inconsistent state, where the file is in neither place. To fix this, you'd need to add lock checking around all accesses to the download queue and current downloads list, and lock them both here. (And be sure to always take the locks in the same order!)
But, it's worse than that, because how is getQueuedFile
implemented?
If the queue is empty, it needs to wait on a file being added. But how
can a file be added the queue if we've locked it in order to perform this
larger startDownload
operation? What should be really simple code
has become really complex juggling of locks.
STM deals with this in a much nicer way:
startDownload = atomically $ do
file <- getQueuedFile
startDownloadThread file
push file currentDownLoads
Now the two operations are performed as one atomic transaction. It's not possible for any other thread to see an inconsistent state. No explicit locking is needed.
And, getQueuedFile
can do whatever waiting it needs to, also using STM.
This becomes part of the same larger transaction, in a way that cannot
deadlock. It might be implemented like this:
getQueuedFile = atomically $
size <- getSize downLoadQueue
if size == 0
then retry
else pop downloadQueue
When the queue is empty and this calls "retry", STM automatically waits for the queue to change before restarting the transaction. So this blocks until a file becomes available. It does it without any locking, and without you needing to explicitly tell STM what you're waiting on.
I find this beautiful, and am happier with it the more I use it in my code.
Functions like getQueuedFile
that run entirely in STM are building blocks
that can be snapped together without worries to build more and more complex
things.
For non-Haskell developers, STM is also available in Clojure, and work is underway to add it to gcc. There is also Hardware Transactional Memory coming, to speed it up. Although in my experience it's quite acceptably fast already.
However, as far as I know, all these other implementations of STM leave developers with a problem nearly as thorny as the original problem with locking. STM inherently works by detecting when a change is made that conflicts with another transaction, throwing away the change, and retrying. This means that code inside a STM transaction may run more than once.
Wait a second.. Doesn't that mean this code has a problem?
startDownload = atomically $ do
file <- getQueuedFile
startDownloadThread file
push file currentDownLoads
Yes, this code is buggy! If the download thread is started, but then STM restarts the transaction, the same file will be downloaded repeatedly.
The C, Clojure, etc, STM implementations all let you write this buggy code.
Haskell, however, does not. The buggy code I showed won't even compile. The
way it prevents this involves, well, monads. But essentially, it is able to
use type checking to automatically determine that startDownloadThread
is
not safe to put in the middle of a STM transaction. You're left with no
choice but to change things so the thread is only
spawned once the transaction succeeds:
startDownload = do
file <- atomically $ do
f <- getQueuedFile
push f currentDownLoads
return f
startDownloadThread file
If you appreciate that, you may want to check out some other #newinwheezy stuff like libghc-yesod-dev, a web framework that uses type checking to avoid broken urls and XSS attacks, and also makes heavy use of threading, so is a great fit for using with STM. And libghc-quickcheck2-dev, which leverages the type system to automatically test properties about your program.