Another thing from the article:
Performance always has room for improvement, whereas correctness improvements are often impossible due to foundational interface compatibility. This asymmetry is crucial and will be discussed later.
The first sentence is not true. The easiest example is the polysemy memory leak mentioned above: suppose that you later discover than heftia
exhibits the same memory issue and it is inherent to free monad based implementation. Then you either rewrite the internals with IO + delimited continuations (which will most likely need some sort of API breakage) or “it’s over”.
Speaking of memory: benchmarks in effectful
also show bytes allocated/copied and peak memory for each benchmark (thanks to tasty-bench) and I just remembered that I saw weird things there in the past. So I went back, applied this patch:
diff --git a/effectful/bench/Main.hs b/effectful/bench/Main.hs
index 2d998bf..43b0994 100644
--- a/effectful/bench/Main.hs
+++ b/effectful/bench/Main.hs
@@ -19,9 +19,9 @@ main :: IO ()
main = defaultMain
[ concurrencyBenchmark
, unliftBenchmark
- , bgroup "countdown" $ map countdown [1000, 2000, 3000]
- , bgroup "countdown (extra)" $ map countdownExtra [1000, 2000, 3000]
- , bgroup "filesize" $ map filesize [1000, 2000, 3000]
+ , bgroup "countdown" $ map countdown [1000000]
+-- , bgroup "countdown (extra)" $ map countdownExtra [1000000]
+-- , bgroup "filesize" $ map filesize [1000000]
]
countdownExtra :: Integer -> Benchmark
@@ -56,15 +56,15 @@ countdown :: Integer -> Benchmark
countdown n = bgroup (show n)
[ bench "reference (pure)" $ nf countdownRef n
, bench "reference (ST)" $ nf countdownST n
- , bgroup "effectful (local/static)"
+{- , bgroup "effectful (local/static)"
[ bench "shallow" $ nf countdownEffectfulLocal n
, bench "deep" $ nf countdownEffectfulLocalDeep n
- ]
+ ]-}
, bgroup "effectful (local/dynamic)"
[ bench "shallow" $ nf countdownEffectfulDynLocal n
, bench "deep" $ nf countdownEffectfulDynLocalDeep n
]
- , bgroup "effectful (local/dynamic/labeled/send)"
+{- , bgroup "effectful (local/dynamic/labeled/send)"
[ bench "shallow" $ nf countdownEffectfulLabeledDynSendLocal n
, bench "deep" $ nf countdownEffectfulLabeledDynSendLocalDeep n
]
@@ -79,7 +79,7 @@ countdown n = bgroup (show n)
, bgroup "effectful (shared/dynamic/labeled/send)"
[ bench "shallow" $ nf countdownEffectfulLabeledDynSendShared n
, bench "deep" $ nf countdownEffectfulLabeledDynSendSharedDeep n
- ]
+ ]-}
#ifdef VERSION_cleff
, bgroup "cleff (local)"
[ bench "shallow" $ nf countdownCleffLocal n
to disable unnecessary benchmarks, then run benchmark for effectful
, mtl
, freer-simple
, polysemy
and fused-effects
on GHC 9.8.4 with cabal run bench -- --stdev Infinity -p LIBRARY
(--stdev Infinity
forces a single run of each benchmark for fairness). Here is a sample data from a run:
$ cabal run bench -- --stdev Infinity -p "mtl"
Configuration is affected by the following files:
- cabal.project
- cabal.project.local
Created semaphore called cabal_semaphore_0 with 16 slots.
All
countdown
1000000
mtl
shallow: OK
59.1 ms, 467 MB allocated, 124 KB copied, 6.0 MB peak memory
deep: OK
386 ms, 3.0 GB allocated, 722 KB copied, 6.0 MB peak memory
All 2 tests passed (0.45s)
And here’s the aggregated data for all libraries as a table (courtesy of chatgpt):
Effect System |
Run Type |
Time |
Allocation |
Bytes Copied |
Peak Memory |
effectful (static) |
shallow |
4.84 ms |
15 MB |
23 KB |
6.0 MB |
|
deep |
4.86 ms |
17 MB |
69 KB |
6.0 MB |
effectful (dynamic) |
shallow |
20.6 ms |
130 MB |
73 KB |
6.0 MB |
|
deep |
20.9 ms |
131 MB |
84 KB |
6.0 MB |
mtl |
shallow |
60.2 ms |
467 MB |
93 KB |
6.0 MB |
|
deep |
389 ms |
3.0 GB |
697 KB |
6.0 MB |
freer-simple |
shallow |
160 ms |
413 MB |
319 MB |
187 MB |
|
deep |
227 ms |
1.2 GB |
476 MB |
653 MB |
polysemy |
shallow |
191 ms |
1.5 GB |
216 MB |
122 MB |
|
deep |
283 ms |
3.7 GB |
318 MB |
437 MB |
fused-effects |
shallow |
329 ms |
1.0 GB |
601 MB |
414 MB |
|
deep |
898 ms |
5.5 GB |
713 MB |
873 MB |
So, the runtime is not the only metric to consider. Memory usage and GC pressure are also important and we see that apart from effectful
and mtl
(credit where credit’s due, allocation sucks, but at least it doesn’t leak) it’s abysmal.
In particular it looks like memory usage for these libraries increases linearly with the amount of code that’s executed, so I suspect that GHC, instead of treating the body of countdown
as a loop, fully unrolls it (Alexis seems to have encountered this before). A disaster.
I don’t have benchmarks for heftia
, but I’d be very surprised if it didn’t exhibit the behaviour similar to freer-simple
in this regard.