That's a seemingly simple question I started asking various people two weeks ago. I didn't get many useful answers, but now I have experience doing it myself, and so here's a blog post brain dump.
I have been trying to convert git-annex to use GHC's threaded runtime, for
a variety of reasons. Naively adding the -threaded
option resulted in a
git-annex
command that seemed to randomly freeze, but only sometimes
(and, infuriatingly, never when I straced it), and a test suite that froze
at a random point almost every time I ran it. Not a good result, and
lacking any knowledge about gotchas with using the threaded runtime, I was
at a loss for a long time (most of DebConf) about how to fix it.
I now know of at least three classes of problems that enabling the threaded runtime can turn up in programs that have been developed using the non-threaded runtime.
accessing a MVar after forkprocess can hang
MissingH has some code similar to this, which works ok with the non-threaded runtime:
forkProcess $ do
debugM "about to run some command"
executeFile ...
In the above example, debugM
accesses a MVar
. Doing that after
forkProcess can result in a MVar deadlock, as it tries to access a MVar
value, that is, apparently, not accessible to the forked process.
(Bug report with test case)
So, using System.Cmd.Utils
from MissingH is asking for trouble.
I switched all my code to the newer and, apparently, threaded runtime
safe System.Process
.
forkProcess is a massively bad idea
Even when not accessing a MVar after forkProcess
, it's very unsafe to
use. It's easy to miss the warning attached to forkProcess, when the code
seems to work. But with the threaded runtime, I've found that most
any call to forkProcess
will sometimes succeed, and sometimes freeze
the whole program. This might only happen around 1 time in 1000.
Then you'll find this warning and do a lot of head-scratching about what
it really means:
forkProcess comes with a giant warning: since any other running threads are not copied into the child process, it's easy to go wrong: e.g. by accessing some shared resource that was held by another thread in the parent.
The hangs I saw could be due to laziness issues deferring code to run
after the forkProcess
that you'd expect to have run before it ... or
who knows what else.
It's not clear to me that it's possible to use forkProcess
safely in
Haskell code. I think it's notable that System.Process
runs the whole
fork/exec in C code instead.
unsafe FFI calls block
According to most of the documentation you'll find in eg, the Haskell wiki,
Real World Haskell, etc, the only difference between the safe
and
unsafe
imports in the FFI is that unsafe
is faster, and shouldn't be
used for C code that calls back into Haskell code.
But the documentation is out of date. Actually, if you're using the FFI,
and the foreign function can block, you need to use safe
. When using
unsafe
, a blocking foreign function can block all threads of the program.
In my case, I was using kqueue
to wait for changes to files, and this
indeed blocked my whole program when linked with -threaded
. Marking it
safe
fixed this.
The details are well described in this paper: http://community.haskell.org/~simonmar/papers/conc-ffi.pdf
Somewhat surprisingly, this blocking only happens when using the threaded
runtime. If you're using the non-threaded runtime with unsafe
blocking
FFI functions, your other pseudo-threads won't be blocked. This is because
the non-threaded runtime has an SIGALARM timer that interrupts (most)
blocking system calls. This leads to other troubles of its own (like
needing to restart interrupted FFI functions, or blocking the other
pseudo-threads from running if the C code ignores SIGALARM), but that's
offtopic for this post.
summary
Converting a large Haskell code base from the default, non-threaded runtime to the threaded runtime can be quite tricky. None of the problems are the sort of thing that Haskell helps to manage either. I will be developing new programs using the threaded runtime from the beginning from now on.
By the way, don't take this post to say that threading in Haskell sucks.
I've really been enjoying writing threaded Haskell code. The control
Haskell gives over isolation of state to threads, and the excellent and
wide-ranging suite of thread communications data types (MVar
, Chan
,
QSemN
, TMVar
, SampleVar
, etc) have made developing a complex threaded
program something I feel comfortable doing for the first time, in any
language.
I saw a talk about tools for easier management of these sorts of tasks in Haskell: http://skillsmatter.com/podcast/home/high-performance-concurrency The package he introduces in that talk was async: http://hackage.haskell.org/packages/archive/async/2.0.1.4/doc/html/Control-Concurrent-Async.html Essentially it hides the implementation of manipulating threads and MVars directly.
Will