Hi all!
Haskell is said to be great with concurrency, so they say, even though I haven’t really explored that avenue much. Recently I heard a podcast where someone talked about a little challenge/scenario they built to compare structured concurrency in several languages. That person was coming from a Scala background, but there are solutions in many languages.
Haskell is missing, so I thought, that should be easy! Did not turn out so easy.
The main Scala implementations seem to use functions for racing several concurrent tasks, that only consider a “successfully” finished task to have won the race. The Haskell async package doesn’t seem to have such a function, but no problem, I’ll write my own:
-- Race a list of actions and return the first one to
-- _successfully_ finish. Fails if all actions fail.
raceAnySuccess :: [IO a] -> IO a
raceAnySuccess tasks =
bracket
(trace "acquire" $ mapM async tasks)
(trace "release" cancelMany)
(trace "waitForSuccess" waitForSuccess)
where
waitForSuccess :: [Async a] -> IO a
waitForSuccess [] = error "All tasks failed!"
waitForSuccess asyncs = do
(completed, result) <- waitAnyCatch asyncs
case result of
Right val -> trace "success" $ return val
Left _ ->
let remaining = trace "filtering" $ filter (/= completed) asyncs
in trace "recursive" $ waitForSuccess remaining
Great. Now, once scenario requires you to start 10,000 tasks, race the one that finishes first, cancel the others.
I wrote a little reproducer to highlight my issue: Haskell async - race successful tasks · GitHub
In that example, I start one task that just waits (threadDelay) for 1 second and then finishes. The other ones are running infinitely, just calling threadDelay again and again.
So my expectation: Run 1 second + whatever overhead you need to start and cancel n tasks.
Works fine for 1,000 tasks:
real | 0m1,123s |
---|---|
user | 0m0,987s |
sys | 0m0,339s |
How long does it take for 5,000 tasks? 1 Second + 5 times the overhead? I.e. linear increase? Nope:
real | 2m52,497s |
---|---|
user | 2m52,823s |
sys | 0m0,907s |
I’m using GHC 9.6.7 (ghcup recommended). Cabal 3.12.1.0. On Ubuntu. I ran the tests with:
cabal build && time cabal run all -- +RTS -N6 -RTS
(on an 8 core machine).
It prints the waitForSuccess
message and then stalls. I have 1 single core at 100% utilization.
I used the -B
option (yeah, really), to see if it was the GC and heard a bing.
Looking at the RTS opts docs, I tried -xn
(non-moving GC), but that’s somehow worse:
real | 3m38,358s |
---|---|
user | 3m39,272s |
sys | 0m0,654s |
The parallel GC options should all be enabled by default, according to the docs, that is -qg
, -qb
, -qn
, even though I only see one core engaged.
Is there anything obviously wrong with my code? If not, can anyone suggest some tuning options, or ways how to debug this?
Would appreciate any input. Thank you.