[ANN] A series of articles on Heftia: The Next Generation of Haskell Effects Management

Another thing from the article:

Performance always has room for improvement, whereas correctness improvements are often impossible due to foundational interface compatibility. This asymmetry is crucial and will be discussed later.

The first sentence is not true. The easiest example is the polysemy memory leak mentioned above: suppose that you later discover than heftia exhibits the same memory issue and it is inherent to free monad based implementation. Then you either rewrite the internals with IO + delimited continuations (which will most likely need some sort of API breakage) or “it’s over”.

Speaking of memory: benchmarks in effectful also show bytes allocated/copied and peak memory for each benchmark (thanks to tasty-bench) and I just remembered that I saw weird things there in the past. So I went back, applied this patch:

diff --git a/effectful/bench/Main.hs b/effectful/bench/Main.hs
index 2d998bf..43b0994 100644
--- a/effectful/bench/Main.hs
+++ b/effectful/bench/Main.hs
@@ -19,9 +19,9 @@ main :: IO ()
 main = defaultMain
   [ concurrencyBenchmark
   , unliftBenchmark
-  , bgroup "countdown" $ map countdown [1000, 2000, 3000]
-  , bgroup "countdown (extra)" $ map countdownExtra [1000, 2000, 3000]
-  , bgroup "filesize" $ map filesize  [1000, 2000, 3000]
+  , bgroup "countdown" $ map countdown [1000000]
+--  , bgroup "countdown (extra)" $ map countdownExtra [1000000]
+--  , bgroup "filesize" $ map filesize  [1000000]
   ]
 
 countdownExtra :: Integer -> Benchmark
@@ -56,15 +56,15 @@ countdown :: Integer -> Benchmark
 countdown n = bgroup (show n)
   [ bench "reference (pure)" $ nf countdownRef n
   , bench "reference (ST)"   $ nf countdownST n
-  , bgroup "effectful (local/static)"
+{-  , bgroup "effectful (local/static)"
     [ bench "shallow" $ nf countdownEffectfulLocal n
     , bench "deep"    $ nf countdownEffectfulLocalDeep n
-    ]
+    ]-}
   , bgroup "effectful (local/dynamic)"
     [ bench "shallow" $ nf countdownEffectfulDynLocal n
     , bench "deep"    $ nf countdownEffectfulDynLocalDeep n
     ]
-  , bgroup "effectful (local/dynamic/labeled/send)"
+{-  , bgroup "effectful (local/dynamic/labeled/send)"
     [ bench "shallow" $ nf countdownEffectfulLabeledDynSendLocal n
     , bench "deep"    $ nf countdownEffectfulLabeledDynSendLocalDeep n
     ]
@@ -79,7 +79,7 @@ countdown n = bgroup (show n)
   , bgroup "effectful (shared/dynamic/labeled/send)"
     [ bench "shallow" $ nf countdownEffectfulLabeledDynSendShared n
     , bench "deep"    $ nf countdownEffectfulLabeledDynSendSharedDeep n
-    ]
+    ]-}
 #ifdef VERSION_cleff
   , bgroup "cleff (local)"
     [ bench "shallow" $ nf countdownCleffLocal n

to disable unnecessary benchmarks, then run benchmark for effectful, mtl, freer-simple, polysemy and fused-effects on GHC 9.8.4 with cabal run bench -- --stdev Infinity -p LIBRARY (--stdev Infinity forces a single run of each benchmark for fairness). Here is a sample data from a run:

$ cabal run bench -- --stdev Infinity -p "mtl"
Configuration is affected by the following files:
- cabal.project
- cabal.project.local
Created semaphore called cabal_semaphore_0 with 16 slots.
All
  countdown
    1000000
      mtl
        shallow: OK
          59.1 ms,          467 MB allocated, 124 KB copied, 6.0 MB peak memory
        deep:    OK
          386  ms,          3.0 GB allocated, 722 KB copied, 6.0 MB peak memory

All 2 tests passed (0.45s)

And here’s the aggregated data for all libraries as a table (courtesy of chatgpt):

Effect System Run Type Time Allocation Bytes Copied Peak Memory
effectful (static) shallow 4.84 ms 15 MB 23 KB 6.0 MB
deep 4.86 ms 17 MB 69 KB 6.0 MB
effectful (dynamic) shallow 20.6 ms 130 MB 73 KB 6.0 MB
deep 20.9 ms 131 MB 84 KB 6.0 MB
mtl shallow 60.2 ms 467 MB 93 KB 6.0 MB
deep 389 ms 3.0 GB 697 KB 6.0 MB
freer-simple shallow 160 ms 413 MB 319 MB 187 MB
deep 227 ms 1.2 GB 476 MB 653 MB
polysemy shallow 191 ms 1.5 GB 216 MB 122 MB
deep 283 ms 3.7 GB 318 MB 437 MB
fused-effects shallow 329 ms 1.0 GB 601 MB 414 MB
deep 898 ms 5.5 GB 713 MB 873 MB

So, the runtime is not the only metric to consider. Memory usage and GC pressure are also important and we see that apart from effectful and mtl (credit where credit’s due, allocation sucks, but at least it doesn’t leak) it’s abysmal.

In particular it looks like memory usage for these libraries increases linearly with the amount of code that’s executed, so I suspect that GHC, instead of treating the body of countdown as a loop, fully unrolls it (Alexis seems to have encountered this before). A disaster.

I don’t have benchmarks for heftia, but I’d be very surprised if it didn’t exhibit the behaviour similar to freer-simple in this regard.

4 Likes