What happens now after "delimited continuations" is merged to GHC?

I watched Alexis King’s talk “Effect for Less” a while ago and followed the progress of her eff library. Since those were from 2020, I had been thinking it was mostly stalled, which is also speculated so in effectful library’s README.

But recently (3 months ago), the “Native, first-class, delimited continuations to GHC” PR was merged.

Does anyone know what should we expect from this progress?

[edit] fwiw, found about this later https://www.reddit.com/r/haskell/comments/zkxdcx/comment/j040vqm/

4 Likes

The approach outlined by Alexis’ talk can be implemented now (or when GHC 9.6 is released), though I’m not sure about its exact performance characteristics. It would certainly be pretty good compared to mainstream effect libraries, but I don’t know if it would be significantly better than a similar approach that doesn’t require patching GHC.

There are still some problems, mainly unsoundness in some edge cases, that we’d like to see solved, perhaps by limiting its expressiveness a bit. Beyond MonadIO, many people also want MonadUnliftIO or MonadBaseControl IO in their effect system, but it’s still an open problem as to how to make it work with delimited continuations (I bet it’s impossible).

Andrzej thinks that hasura/eff#12 is a bug, which is what I originally thought too. But it’s really an absence of feature rather than a bug, because the behavior described in the issue is entirely in line with the semantics of delimited continuations. So I think this problem is not essential to the development of eff in general.

But even with all these caveats, it’ll surely be a vast improvement from the previous approaches to effect systems.

1 Like

Just to provide a picture for the general landscape, eff belongs to a family of effect libraries that uses a technique called “evidence-passing” to avoid the high performance cost brought by free monads, mtl typeclasses and the likes. They don’t necessarily have delimited continuations but all have very good performance compared to say, polysemy or unspecialized fused-effects:

  • effectful doesn’t have delimited continuation but has generally very good support for IO.
  • cleff is a library that I wrote that is intended to be beginner-friendly, it also doesn’t have delimited continuation.
  • eveff and mpeff are proof-of-concept libraries of the paper that first formally described evidence-passing

I personally think that evidence-passing is the future of Haskell effect libraries, because it’s easy to understand, has low performance overhead and works very well with other techniques (including either delimited continuation or MonadUnliftIO, it’s delimited continuation and MonadUnliftIO that don’t work well with each other).

2 Likes

eveff doesn’t solve the same performance problems. In the effects for less talk it is briefly mentioned that an approach based on lambdas (which is what eveff is) is still basically the same as building up a concrete tree of your whole program.

@jaror My impression is that with the Ctl monad eveff provides, you only build a tree (i.e. closure) when control is actually called, and otherwise no allocation happens? Please tell me if I’m missing something here.

That’s true, but knowing whether control (known as yield in eveff) is called requires you to know the effect handler that you will eventually use. Because the handler decides whether to use a function (doesn’t use yield) or operation (does use yield).

If you write your effectful program and its handler in separate files and no inlining happens, then that information is obviously unknown and GHC cannot use it for optimizations. I’ve also tested what happens when you do put them in the same file and force inlining, but I found that GHC was still not able to propagate that information from the handler to the usage sites.

Here’s the link to that part from Alexis’ talk by the way: Alexis King - “Effects for Less” @ ZuriHac 2020 - YouTube (at 49:47).

1 Like

I see what you mean: with the approach of eveff we can avoid allocating closures, but we lose the ability to store a lot of values on the stack. However this portion of the talk seems a bit handwavy as it lumps together the problem of allocating closures themselves and extra allocation due to the possibility of creating closures.

I guess my next question would be: how much performance loss is caused by each of them? I have run benchmarks of eveff against mainstream effect libraries (without specialization or cross module inlining, of course) and the improvement is already very significant. How much do we gain from here by storing values on the stack?

I’d be interested in seeing those results. Benchmarks I’ve done showed no significant difference compared to other effect libraries. The only significant improvement I’ve seen is when I use the Local state effect which has a built-in handler (which is guaranteed to not use yield so GHC can optimize it).

Here - it’s for a talk I gave at HOPE '22.

Several observations are:

  • Polysemy is really slow
  • Fused-effects is really slow if you give it a large effect stack
  • MpEff is fast even if you give it a large stack, except if you yield a lot, in which case it slows down but is still faster than other libraries

I guess the “very significant” is subjective (and maybe exaggerated) but you can see that it is overall an improvement from the current implementations. I think a big factor in these performance improvements (specifically in the case of deep effect stacks) is evidence-passing, which doesn’t require adding extra computational structure to the monad for each effect you add.

Now let’s talk about effects-zoo: it’s strange that the BigStack microbenchmark in effect-zoo generates no performance difference for fused-effects. I guess it is because of a combination of

  • all the extra effects are more “outer” than the main effect
  • the extra effects have no computational content.

The FileSize microbenchmark is really not a good one IMO, as it’s more IO bound than computation bound. The Reinterpretation microbenchmark on the other hand looks like it’s just benchmarking effect invocation and bind, but specifically with “reinterpretation” i.e. calling into another effect in the handler. I’d love to see how this benchmark turns out for higher n and deeper effect stacks, but it’s pretty late here so maybe another day.

I think I would agree that EvEff and maybe also MpEff can be around 15-30% faster in general and in the case of big stacks even more.

However, eff does promise to be around 80-90% faster (looking at the charts from the Effects for Less talk), so that is perhaps why I have a different notion of “very significant” performance improvement. I believe it would basically allow you to get the performance of the Local effect with that CountDown benchmark but then without having to implement it with a built-in handler.

The charts in the Effects for Less talk shows

library time (μs)
eff 125
mtl 640
fused 1162

So the first question is: what benchmark is Alexis talking about here? She didn’t explicitly mention it (or I didn’t catch it), but we can infer from the context that it’s likely to be the countdown benchmark. Then, what is the n she used? I tested n = 10000 on my machine and got the following results:

library time (μs)
mtl 569
fused 1140

which is pretty close if you ask me. How, then, did eveff and my proof-of-concept sp do?

library time (μs)
sp 203
ev 384
mtl 569
fused 1140

sp is only less than 1x slower than eff! Note that sp uses the same Ctl monad as eveff, and I only improved on the evidence-passing part. So, yes, even with Ctl you can have an 80% performance improvement, at least in this benchmark.

This is obviously not the whole story, and It’d be very interesting to see how eff performs in the other benchmarks.

1 Like

Wow, I’m pretty surprised by those results.

My diagnosis is that the countdown benchmark is not that great to show the difference between sp/ev and eff. The reason is that the bind doesn’t really get in the way of other optimizations. Here’s another benchmark adapted from @lexi.lambda’s talk at 43:36:

lookupResult :: Int -> Maybe Int
{-# NOINLINE lookupResult #-}
lookupResult n
  | n >= 0 = Just n
  | otherwise = Nothing

programSp :: S.Error String S.:> es => Int -> S.Eff es Int
programSp n = do
  nums <- 
    case lookupResult n of
      Nothing -> S.throw "not found"
      Just val -> return [1..val]
  return $ sum nums
{-# NOINLINE programSp #-}

fooSp :: Int -> Either String Int
fooSp n = S.runEff $ S.runError (programSp n)

This has been designed such that the monadic bind actually breaks the fusion of the list producer [1..val] and the consumer sum. My hypothesis (and Alexis’ implicit claim) is that eff still allows GHC to fuse these two operations while mtl (and maybe all other effect systems?) don’t.

Unfortunately, I don’t have easy access to a version of GHC with the delimited continuations primops, so I can’t really confirm that eff is much faster. But I have been able to confirm that sp does in fact obstruct the list fusion.

Although it does seem like this problem doesn’t necessarily require the delimited continuation primops. Looking at the core that GHC produces, it seems like GHC just missing an optimization akin to the state hack for IO.

1 Like

I tried to copy down the Eff monad in eff and test your code out (I exchanged the throw with pure []):

lookupResult :: Int -> Maybe Int
{-# NOINLINE lookupResult #-}
lookupResult n
  | n >= 0 = Just n
  | otherwise = Nothing

programEff :: Int -> Eff es Int
programEff n = do
  nums <-
    case lookupResult n of
      Nothing  -> pure []
      Just val -> return [1..val]
  return $ sum nums
{-# NOINLINE programEff #-}

and this is the Core I got:

$wlookupResult
  = \ ww ->
      case >=# ww 0# of {
        __DEFAULT -> Nothing;
        1# -> Just (I# ww)
      }

lvl = \ rs s -> (# s, Result rs [] #)

Rec {
$w$wgo1
  = \ w w1 ->
      case w of {
        [] -> w1;
        : y ys -> case y of { I# y1 -> $w$wgo1 ys (+# w1 y1) }
      }
end Rec }

programEff
  = \ @es n ->
      let {
        ds
          = case n of { I# ww1 ->
            case $wlookupResult ww1 of {
              Nothing -> lvl;
              Just val ->
                let {
                  a1
                    = case val of { I# y ->
                      case ># 1# y of {
                        __DEFAULT ->
                          letrec {
                            go9
                              = \ x ->
                                  : (I# x)
                                    (case ==# x y of {
                                       __DEFAULT -> go9 (+# x 1#);
                                       1# -> []
                                     }); } in
                          go9 1#;
                        1# -> []
                      }
                      } } in
                (\ rs s -> (# s, Result rs a1 #))
            }
            } } in
      (\ rs1 s ->
         case ds rs1 s of
         { (# ipv, ipv1 #) ->
         case ipv1 of { Result rs2 a1 ->
         (# ipv,
            Result rs2 (case $w$wgo1 a1 0# of ww { __DEFAULT -> I# ww }) #)
         }
         })

…okay, it’s not fused. Maybe it would fuse on an older GHC version? My understanding is that Alexis just used this example to show that specialized/concrete (>>=) has the ability to expose further optimizations to the compiler, not necessarily this specific one. With that understanding, what we can expect for eff in terms of performance really boils down to two points:

  • You can avoid the performance cost of the Ctl monad, a monad more complicated than IO
  • More values can stay on the stack instead of the heap.
3 Likes

Trying to understand the implication of the feature itself. Is 🤖 Introduction to Abilities: A Mental Model · Unison programming language also an applicable mental model for thinking about using delimited continuation for modeling effects?

Probably not. The API to the effects is fairly decoupled from the implementation strategy. Alexis’s implementation of delimited continuations is a change to the implementation strategy which will have minimal impact on the API. After having looked at Unison Abilities briefly I think they’re largely about the API. You could use delimited continuations to implement them, or alternatively some completely different backend.

Abilities seems to be just another name for algebraic effects. One usage of delimited continuations is to implement algebraic effects, but we can have many other uses for it too. I saw someone implement autodiff with the new primops the other day and the performance seems to be pretty good.

1 Like