Make GHC -threaded the default RTS

Do you have any benchmarks?

I don’t think it’s worth breaking backward compatibility.

1 Like

I don’t have benchmarks atm, sorry. But I’ve seen significant differences on a few occasions. One I remember of the top of my head was a wc clone using conduit that went something like from 800ms to 300ms just by removing -threaded.

2 Likes

I believe the culprit is usually parallel garbage collection. Although the only thing I can find is this old wiki page: Performance/Parallel - HaskellWiki.

Agreed - for many e.g. command line tools, the single threaded runtime is what you want. It can be a lot faster. I think this isn’t especially important either way, given that you can (iirc) specify the binary’s default threadedness at build time.

2 Likes

There’s an accepted GHC proposal to switch to -threaded by default: Compile with threaded RTS by default by ulysses4ever · Pull Request #240 · ghc-proposals/ghc-proposals · GitHub

When trying to implement it I wasn’t able to figure out a good way to deal with GHC’s test suite. The MR is still around but it’s probably worth nothing at this point because of how old it is: starting from scratch may be easier.

I encourage to try implement it if you have energy, and would be happy to participate. But you’d need guidance of someone from the GHC team who understands the inner workings of the test suite (probably @bgamari).

2 Likes

Also, -threaded on its own has nothing to do with parallel GC, which, indeed, is a major source of worries about performance. For parallel GC to kick in, you need to use -N.

Otherwise, according to anecdotes, performance changes are negligible. Of course, having a thorough evaluation would be better.

1 Like

I think it segfaults when profiling sometimes.

I find 20% and 35% degradation with -threaded (-N1), depending on the frontend, in LambdaHack the game. Last benchmarked half a year ago.

Mikolaj, this is interesting, thank you. I believe the idea was that making -threaded the default would push to explore such cases more actively. They should be treated as bugs. Regardless of the default, posting them on the bug tracker would increase the chance of fixing.

Yes, I’m neutral on changing the default. I’m sure this is already on the bug tracker, but if anybody is actually working on it, I’d gladly provide the repro instructions (cabal build; make benchFrontendCrawl).

1 Like

I don’t currently know of any issues with profiling in the threaded RTS but please do open a ticket if you have a reproducer.

1 Like

I’m not sure this is entirely true. Some performance degradation of -N1 relative to the non-threaded RTS is unavoidable. For instance, much of the runtime system needs to be protected by various mutexes; taking and releasing these mutexes is work, however small.

That being said, I would think that the degradation should be on the 1% scale, nowhere near 10%. Having a ticket to track @Mikolaj’s case would be great.

2 Likes

All I’ve heard in the negative so far is a bit of grumbling about performance. This seems pale weighed against correct and predictable behavior.

And thanks to @artem (EDIT: oops, not @jaror. Thanks) for pointing out this decision was already made!

I think you mean @artem’s comment

Here’s an open ticket about that. @vmchale: are you aware of any other reproducers?

Indeed, there doesn’t seem to be a GHC ticket about the slowdown. I must have reported it in some rooms, not on the tracker. I’ve opened a ticket: 40% to 100% slowdown from -threaded (#21274) · Issues · Glasgow Haskell Compiler / GHC · GitLab

3 Likes

You should expect to see no more than 1-2% regression from -threaded. Any more than that, and it’s likely an issue that you can fix yourself by changing your code. The most common problem is that the main thread is a bound thread while most other threads (such as those created by forkIO or withAsync) are not bound, which means that context-switching between the main thread and other threads with -threaded is an expensive OS-level context switch. The fix is to not use the main thread for anything, just create a new thread using withAsync immediately.

1 Like

Thank you for the tips (and for writing them down as excellent haddocs at least a decade ago). That’s unfortunately impossible. SDL2 (due to OpenGL on OSX, possibly also on Windows, but I’m not sure if that changed) needs to run on the main thread. Thank you, Apple. Of course, I can stop using any threads except the main thread, just as pragmatic programmers do, and forgo abstraction and isolation of components.

Ah, so what you want for this use case is support for running lightweight threads in a bound thread. The non-threaded RTS does this, but the threaded RTS doesn’t. And there’s a good reason it doesn’t support that: if you make a safe foreign call from some thread, then the other threads are supposed to keep running (that’s what the threaded RTS guarantees), but we wouldn’t be able to run the bound thread if its OS thread was in a foreign call made by some other Haskell thread.

So the specific tradeoff you’re making here is to give up some of the progress guarantees that you get from the threaded RTS in exchange for being able to schedule lightweight threads on the main thread. That’s fine I suppose. We could still make the threaded RTS the default, it would just mean that specialised use cases like this would need to use --single-threaded.

3 Likes