When making a service, program, or even a complex CLI, one almost always wants the threaded RTS. It seems well past time to make the threaded RTS the default RTS. The arguments are numerous but generally boil down to correctness and expected operation. What blockers, if any, do people see in changing the default RTS?
It’s significantly slower than the single threaded runtime.
Do you have any benchmarks?
I don’t think it’s worth breaking backward compatibility.
I don’t have benchmarks atm, sorry. But I’ve seen significant differences on a few occasions. One I remember of the top of my head was a wc
clone using conduit that went something like from 800ms to 300ms just by removing -threaded
.
I believe the culprit is usually parallel garbage collection. Although the only thing I can find is this old wiki page: Performance/Parallel - HaskellWiki.
Agreed - for many e.g. command line tools, the single threaded runtime is what you want. It can be a lot faster. I think this isn’t especially important either way, given that you can (iirc) specify the binary’s default threadedness at build time.
There’s an accepted GHC proposal to switch to -threaded
by default: Compile with threaded RTS by default by ulysses4ever · Pull Request #240 · ghc-proposals/ghc-proposals · GitHub
When trying to implement it I wasn’t able to figure out a good way to deal with GHC’s test suite. The MR is still around but it’s probably worth nothing at this point because of how old it is: starting from scratch may be easier.
I encourage to try implement it if you have energy, and would be happy to participate. But you’d need guidance of someone from the GHC team who understands the inner workings of the test suite (probably @bgamari).
Also, -threaded
on its own has nothing to do with parallel GC, which, indeed, is a major source of worries about performance. For parallel GC to kick in, you need to use -N
.
Otherwise, according to anecdotes, performance changes are negligible. Of course, having a thorough evaluation would be better.
I think it segfaults when profiling sometimes.
I find 20% and 35% degradation with -threaded (-N1), depending on the frontend, in LambdaHack the game. Last benchmarked half a year ago.
Mikolaj, this is interesting, thank you. I believe the idea was that making -threaded the default would push to explore such cases more actively. They should be treated as bugs. Regardless of the default, posting them on the bug tracker would increase the chance of fixing.
Yes, I’m neutral on changing the default. I’m sure this is already on the bug tracker, but if anybody is actually working on it, I’d gladly provide the repro instructions (cabal build; make benchFrontendCrawl).
I don’t currently know of any issues with profiling in the threaded RTS but please do open a ticket if you have a reproducer.
I’m not sure this is entirely true. Some performance degradation of -N1
relative to the non-threaded RTS is unavoidable. For instance, much of the runtime system needs to be protected by various mutexes; taking and releasing these mutexes is work, however small.
That being said, I would think that the degradation should be on the 1% scale, nowhere near 10%. Having a ticket to track @Mikolaj’s case would be great.
All I’ve heard in the negative so far is a bit of grumbling about performance. This seems pale weighed against correct and predictable behavior.
And thanks to @artem (EDIT: oops, not @jaror. Thanks) for pointing out this decision was already made!
Indeed, there doesn’t seem to be a GHC ticket about the slowdown. I must have reported it in some rooms, not on the tracker. I’ve opened a ticket: 40% to 100% slowdown from -threaded (#21274) · Issues · Glasgow Haskell Compiler / GHC · GitLab
You should expect to see no more than 1-2% regression from -threaded
. Any more than that, and it’s likely an issue that you can fix yourself by changing your code. The most common problem is that the main thread is a bound thread while most other threads (such as those created by forkIO
or withAsync
) are not bound, which means that context-switching between the main thread and other threads with -threaded
is an expensive OS-level context switch. The fix is to not use the main thread for anything, just create a new thread using withAsync
immediately.