Non-Moving Garbage Collector causing issues

We’ve just updated our production code base to GHC 9.2.8 (lts-20.24) and we decided to try out the --nonmoving-gc option. Weird thing though, is that when pushing this to the test environment the running services (about 50% of them) started restarting every few hours.
This normally happens when they run out of memory: the platform notices failing health checks, kills the running service and starts a new one. But after checking the memory usage, it was at about 50% the allowed amount, and we were getting the following error prints just after failed health checks:

Control.AutoUpdate.mkAutoUpdate: worker thread exited with exception: thread blocked indefinitely in an MVar operation

(in our code, auto-update is a dependency of the logging code via monad-logger's dependency on fast-logger, and just the main workhorse warp)

Now, this never happens when running the services without the --nonmoving-gc option, so I’m wondering if anyone knows what’s going on?
We have noticed that the CPU usage doubles when using --nonmoving-gc, but I’m not sure if that is related in any way.

Does anyone have ideas, or advice on how to remedy this phenomenon? Is GHC 9.2 just a bit buggy w.r.t. the non-moving GC, and would using GHC 9.4 maybe help?

2 Likes

Lots of non-moving GC bugs were fixed in GHC 9.4.5: 2.1. Version 9.4.5 — Glasgow Haskell Compiler 9.4.5 User's Guide, including some race conditions.

4 Likes

Ah, that’s good to know; we’ll try it again when we move to GHC 9.4+ :slight_smile:

2 Likes