[ANN] A series of articles on Heftia: The Next Generation of Haskell Effects Management

I was unsure about many things, so over the past few months I have been looking into performance whenever I had time. Writing it all out again now, it seems the performance in free monad based effect system libraries with respect to time complexity can be decomposed into the following three factors. (I have not yet investigated space complexity. What would be the best way to approach it?)

  1. The O(m^2) overhead with respect to the number of instructions m in the program, as pointed out in the Reflect w/o Remorse paper

This is what I previously advertised in heftia as the speedup from freer-simple’s FTCQueue implementation. It is exactly as the paper says.

  1. The O(n) overhead with respect to the depth of the effect stack, that is the number of effects in the type level list, due to linear search for handlers (https://x.com/ymdfield/status/1946881981981261831, Are complaints about Free Monad performance pointless, and no different to a corresponding Monad construction? - #17 by ymdfield )

This was something heftia did not solve. As the effectful family does, this can be solved by adopting an evidence-passing approach that treats a vector of handlers as the environment of ReaderT. Regarding this, the free monad itself is not the cause of the performance degradation. It is a matter of how to maintain the handler environment. We need to change the way instructions are injected into an Open Union and placed as nodes in the free monad tree. I plan to try some combinations of approaches and will report results when I have them.

  1. The overhead of delimited continuations

This differs depending on whether you use GHC primops or a free monad. This was hinted at in earlier eff benchmarks ( heftia/benchmark/bench-result/coroutine-shallow.svg at v0.7.0.0 · sayo-hs/heftia · GitHub ), but when using primops the coroutine effect becomes extremely slow (O(m^2)). Why this difference arises is a matter of the primops implementation and is beyond my current understanding, but at least this problem does not occur with a free monad based approach. However, outside cases like this, primops should be faster than free by a constant factor only, although I have not been able to test it yet. At this point, I think the best approach is to provide both implementations and let users choose via a compile flag depending on whether they use coroutines.

Overall, as before, I do not think the free monad is the cause of slowness. I cannot state it definitively, but I do think there is a slightly misleading aspect or a subtle point there. At the very least, it may be true for a naive implementation, but that is ultimately a question of how you use it as a building block to assemble an effect system. Well, it might turn out not to be the case. I will report concrete benchmark results as they become available.

5 Likes