[ANN] A series of articles on Heftia: The Next Generation of Haskell Effects Management

ymdfield · May 16, 2025, 2:21pm

First, thank you for reading the article carefully and providing objective and well-founded feedback. Thanks to you, I feel I can now see the issue from a broader perspective.

Let me begin by responding to the point I most want to discuss at the moment. I apologize that this response has become rather long and may not be well organized, but I would appreciate it if you could review it again to see whether it can withstand objective critique.

These statements are based on my experience when I attempted to improve the performance of my library by forking an existing IO-wrapper implementation. (Incidentally, while I succeeded in making it work correctly, I failed to achieve any speed improvements.)

What I encountered at that time was that even with types, runtime errors and segfaults could occur. This was something I had not experienced in either my library or the previously type-protected Haskell programming, and it was a harsh experience that reminded me of my days working in C.

A typical bug was runtime access to an uninitialized handler in the evidence vector. This occurs, for example, when runState and runReader are composed in a particular order. In the type-safe version of my library, it is simply impossible to compose them in that order (although the matter is a bit more complicated. It is not merely about the order of runState and runReader, but about the compatibility between higher-order effects and delimited continuations). Any such operation always triggers a type error and cannot even be written. In other words, the IO-wrapper approach, in its default state, allows operations that are essentially wrong, and to prevent this, one must retroactively guarantee interface type safety by isolating and hiding unsafe modules. (This corresponds to the reverse of making invalid states unrepresentable.)

Thus, there are two contrasting development processes here:

Starting with complete safety guarantees that are too strict to be practical and gradually relaxing them (while still remaining safe as long as typing is preserved).
Starting with an interface that may have safety gaps and filling them in as they are discovered.

I do not deny the possibility that this is an overgeneralization of my own experience, but fundamentally, IO-wrapper libraries tend to fall under category 2. Of course, my library also contains some elements of 2—such as open union or certain functions that are not sufficiently generalized—but in terms of the sheer number of cases and their locality, I would say it is small.

During the development of bluefin and effectful, I believe there were several occasions when such safety holes in the interface were discovered and then patched. This includes not only issues recorded in issue trackers but also minor fixes applied immediately when tests.

For example, aside from runtime errors, there may have been cases where IORef combined with certain interpreter mechanisms or concurrency exhibited unintelligible behavior. If you have indeed encountered almost none of these issues, I would like to know how you managed to prevent them. I do not know how to prevent such issues in advance. My understanding is that they can only be dealt with reactively after they occur.

(Moreover, I am particularly interested in looking further back to the period when people were experimenting with whether UnliftIO and delimited continuations could coexist.)

In other words, what I’m ultimately trying to say is that there are two design philosophies here, each holding that:

If there is a possibility of bugs, you should eliminate that possibility. Rather, you should demonstrate that there is no possibility of such bugs.
If you say there is a possibility of bugs, you should show the basis for that claim.

As for this framework, I’m stepping back and don’t know how one should think about it at that moment. However, I believe this is simply a difference in philosophy, and it is not a matter of one being right and the other wrong. What do you think?

I had a misunderstanding about this. I did not distinguish between issues recorded in the issue tracker and those discovered and fixed on the spot through implementation and testing. In my next article, I will correct this and revisit the outcomes of our discussion here.

I understand this, but I feel it overlooks the distinction between the level of technical elements and the protocols that connect them.

Isn’t this specifically a guarantee regarding resource safety, rather than a guarantee of the safety of the entire effect system?

It cannot be denied that it may be used as a buzzword in common parlance, but there is at least a clear definition that people ought to rely on, and I adhere to it. This is what is presented in Plotkin’s work and in the literature on the eff language. I intend to write about it in an article in due course.

That is helpful. Thank you.

tomjaguarpaw · May 16, 2025, 2:59pm

Ah, I see! So you are talking about your experience as an implementor of an IO-wrapper effect system, not the experience of a user. Indeed, the experience of J Launchbury and SPJ when they implemented ST was that they had to be careful to get the primitives correct. But once the primitives were correct the experience of the user was free from type unsafety issues. Naturally it’s better if the implementation can be guided by types, as well as the end use. Nonetheless, Bluefin’s implementation is far simpler than effectful’s so I think it’s easier to convince oneself that it’s correct.

I think this story would be far more valuable than an unsupported claim that IO-based effect systems are less type safe in some unexplained way.

I don’t see it as the opposite. I see it as another form of the same: the analytic approach to making invalid states unrepresentable (correct by careful inspection), as opposed to the synthetic approach (correct by construction). Doubtless the latter is better all other things being equal, but I am not yet convinced that all other things are equal.

I would say it’s small in Bluefin too (and there are ways it could be smaller yet). Have you looked?

It’s suprising to me that you’re willing to believe this without being able to provide evidence. However, you are actually correct! These two issues were reported to me recently, and fixed. They are not design flaws. They are implementation oversights. No churn of the implementation is required, and end user programs were not affected by the fix. Still, I take your point that it is better if Haskell’s type system can help. I suppose it could if I took more care to use less unsafe stuff in the implemention of Bluefin and developed a small “trusted core”, but it really doesn’t seem to be worthwhile at the moment. The risks of something going wrong in that regard are minimal.

I can confirm that no such holes have been found that are not recorded in the issue tracker.

I have not experienced such issues. I don’t know why you think IORef combined with “interpreter mechanisms” could pose a problem. Perhaps I’m just not seeing it because Bluefin doesn’t have an “interpreter mechanism”? Regarding concurrency, yes, Bluefin gets away with it by not yet having a native concurrency story. But it will continue to get away with it by having well-designed concurrency primitives that don’t cause such problems. See https://github.com/tomjaguarpaw/bluefin/issues/34 for up-to-date discussion on concurrency on Bluefin.

I certainly believe that the synthetic approach is better when it is sufficient, but I don’t yet see that it is sufficient. For example, I don’t yet see how one can use MonadUnliftIO to bracket IO operations in Heftia as one would want.

No, I don’t think so, unless you’re using a different definition of “safety of the entire effect system” to me. It’s a guarantee that ST contains no type holes. What do you mean by it?

Very well, I am content to remain ignorant for the time being.

You’re welcome!

jeukshi · May 16, 2025, 4:02pm

My intuition has always been that algebraic effects is a system where arbitrary effects can be combined into new effects. In this sense, foo :: Log :> es, DB :> foo doesn’t have two effects, but one anonymous effect arising from this combination. Granted, this is how I’d do a quick explanation of what relational algebra is about: combining relations into new relations. Hope is not much more complicated than that.

But that aside, what I’m interested in is how Heftia deals with concurrency. This is where those effect systems that upload state into the cloud (monad) usually struggle. If you could do a chapter on that, I’d love to read it.

ymdfield · May 16, 2025, 4:10pm

I would like to focus on this point. That is, on how to guarantee that the primitives are correct.

In the IO-wrapper approach, guaranteeing correctness, as far as I know, requires a large amount of work and presents challenges in terms of maintainability. Specifically, you must set the axioms of the IO monad in accordance with the GHC specification and perform the same kind of formal verification as for a typical procedural language. It is very elaborate, and the benefits do not justify the cost, so in reality it will not be carried out.

On the other hand, with the algebraic data type method, the proof basically consists of transforming types and functions faccording to a few rules, so it does not require much mental effort. Once you get used to it, you can readily prove correctness.

That said, even without formal guarantees, writing a sufficient number of tests might make practical correctness virtually certain. However, I do not know whether that would be a good choice in the long term.

Moreover, at present, the specification that each effect library adheres to, that is, the criteria for what behavior can be considered correct, is not clear. It seems to allow room for subtle behaviors. In my methodology, the theoretical foundation is at least clear, and I can provide an answer when asked to explain behavior or validity.

With regard to this, it is based on my experience implementing the IO-wrapper approach
and on judgments formed from seeing several issues. However, those might have been with libraries other than effectful or bluefin:

https://github.com/re-xyr/cleff/issues/15
https://github.com/hasura/eff/issues/13

I have also confirmed segfault with eveff and mpeff, although they have not been reported.

I would appreciate it if you could share your reasons for thinking that. To me, this does not appear minimal, but rather a relatively imminent risk. Even if I take the stance that delimited continuations of algebraic effects are unnecessary, I cannot help but consider the possibility that people will build an ecosystem relying on concurrency interfaces that is potentially unsafeCoerceable. This is not about bluefin, but about ReaderT IO in general. If you have any ideas on how this risk could be kept sufficiently small or on any way to recover without issues should it materialize, I would be grateful if you could share them.

Regarding the interpreter mechanisms, it may not apply to bluefin. polysemy or effectful might be closer. When I implemented an IO-based interpreter related to concurrency, I felt, based on that experience, that it was difficult to track how IORef propagates where when multiple effect interpreters interact in a complex way.

By that I mean the absence of runtime errors and semantic correctness, that is, the theoretical guarantee that the effect and interpreter will not produce any problematic results.

ymdfield · May 16, 2025, 4:14pm

I’d like to write about the concurrency. Thank you! However, it might differ slightly from your expectations in the sense that it doesn’t use the standard concurrency primitives like MVar or STM, but instead has a more strictly typed interface. Because of that, it may feel somewhat restrictive in terms of flexibility.

tomjaguarpaw · May 16, 2025, 7:31pm

Maybe. I’m not sure it’s too elaborate. There are only really two primitive effects in Bluefin: State and Exception. State has already been proved safe in the ST paper. Extending it to Exception doesn’t seem too challenging. Beyond that there would have to be some work to prove the Bluefin’s use of threads is valid when it comes to connectCoroutines, but I also guess that’s not much. Still, I accept that that is work requiring type theory expertise, and knowledge of how GHC works. Yet you are using work that required category theory expertise, so work requiring expertise can’t be a blocker. Furthermore, you are using GHC, which is implemented using IORefs and exceptions, so you’re already accepting, at least implicitly, that they work how users think they do.

Still, when the theoretical work is complete there is still the practical work of implementation, and that is harder in the analytic case because the type system is less helpful. Ideally one would come up with a minimal “trusted core”. I put my commits where my mouth is and spent some time reducing the Bluefin “trusted core”. You can see the results at #55. I suspect it can be reduced further.

But yes, you’re right: it’s better to always be able to rely on the type system. But that is not the only concern. In Haskell there are the risks of space leaks and unexpected slowdowns due to closures having structure other than we expect. That’s one of the big benefits of the IO-wrapper approach. The operational semantics are much clearer.

I believe this to be true of Bluefin also. Try me!

Ah, thank you for sharing! I think your post would have been a lot stronger if you had included links to these issues. Effectful has a similar issue (prevented at run time): https://github.com/tomjaguarpaw/bluefin/issues/15#issuecomment-2098343722. Bluefin is immune, however, due to its ST-style type system, so please don’t make the claim that all IO-wrapper effect systems suffer this problem.

The risk is minimal for State and Exception since I’m assuming their safety is already proven by the ST paper, or a simple extension. Of course I could be wrong, since the proof does not exist yet, but I consider this very unlikely. For more sophisticated scenarios such as connectCoroutines I guess the analysis must be correspondingly more sophisticated, but I don’t see anything to be concerned about. Regarding concurrency, well, Bluefin does not have native concurrency, and I will not add any until I am convinced there is a safe API. Regarding concurrency and ReaderT IO I’m not sure I really follow. Are you really saying you think there are type safety holes in the base concurrency and IO story provided by GHC?

Well, OK, understood, but please don’t extend the judgement to all IO-wrapper effect systems.

I still don’t really know what that means.

ymdfield · May 16, 2025, 8:08pm

At least based on the explanation provided, I am somewhat convinced of bluefin’s methodology. Putting aside higher-order effects and concurrency for the moment, it seems that the essential primitives can be understood as operations related to “zero- or one-shot continuations,” which correspond to Exception and State, respectively. If this intuition is correct, it is indeed understandable that a small, explainable core could implement a subset of algebraic effects. In particular, I liked #55.

However, regarding the group of functions such as connectCoroutines, which internally use concurrency directly instead of the trusted core (is my understanding correct here?), it still seems difficult to predict their behavior and guarantee correctness when users combine them in complex ways. In any case, the uncertainty has decreased except the concurrency aspects. Thank you for the explanation!

Okay. Perhaps this is because bluefin does not have the ReaderT part of ReaderT IO…

tomjaguarpaw · May 16, 2025, 8:18pm

connectCoroutines would be considered part of the trusted core. However, it is not a complex use of coroutines. In particular there is no parallel execution. Execution switches between the threads deterministically at each yield, so I don’t think it would be hard to write down the properties that it satisfies (though somewhat harder to prove them).

github.com/tomjaguarpaw/bluefin

bluefin-internal/src/Bluefin/Internal.hs

master


      
          -- | Connect two coroutines.  Their execution is interleaved by
          -- exchanging @a@s and @b@s. When the first yields its first @a@ it
          -- starts the second (which is awaiting an @a@).
          connectCoroutines ::
            forall es a b r.
            (forall e. Coroutine a b e -> Eff (e :& es) r) ->
            (forall e. a -> Coroutine b a e -> Eff (e :& es) r) ->
            -- | ͘
            Eff es r
          connectCoroutines m1 m2 = unsafeProvideIO $ \io -> do
            av <- effIO io newEmptyMVar
            bv <- effIO io newEmptyMVar
          
            let t1 :: forall e. IOE e -> Eff (e :& es) r
                t1 io' = forEach (useImplUnder . m1) $ \a -> effIO io' $ do
                  putMVar av a
                  takeMVar bv
          
            let t2 :: forall e. IOE e -> Eff (e :& es) r
                t2 io' = do

This file has been truncated. show original

Bluefin doesn’t keep around any state in its Eff monad, if that’s what you mean?

ymdfield · May 16, 2025, 8:24pm

That’s not quite what I meant, but it’s hard to express clearly… I think I need to take some time to organize my thoughts.

I hope to write an article about it eventually.

Probably, yes, I was referring to the fact that the internal definition of the Eff monad is just IO a.

Ambrose · May 16, 2025, 9:28pm

That’s the filesize benchmark actually, which does IO. The “reference” baseline is 2.10ms.

So it’s really 0.08ms vs 1.00ms overhead difference. ~10x slower! Which is shout in line with the “deep” state difference (268us vs 2.40ms).

And heftia’s difference between shallow and deep is notable. Vs effectful which has barely any difference.

So for a boring web app without tight SLAs..who cares those are peanuts..but if I’m managing a 16ms frame time budget in my game, I wouldn’t bother with heftia and stick to effectful (or cleff which is similar).

tomjaguarpaw · May 17, 2025, 12:07pm

Oh, I should also say that HandleReader would have to be part of the trusted core, and it’s rather complicated. I don’t fully understand it yet. Also, I did come across (and fix) a type safety violation in an unused and deprecated part of the codebase: Implement internals using fewer directly unsafe features by tomjaguarpaw · Pull Request #55 · tomjaguarpaw/bluefin · GitHub But again, the design is correct so the fix is only to the implementation. Users don’t have to change anything (even though there probably are none for this feature).

jeukshi · May 17, 2025, 6:49pm

Here is my proof that it is race free and the threads (almost, only initially) don’t execute concurrently.

github.com/jeukshi/connectCoroutines_tlap

connectCoroutines.tla

main


      
          \* put s x
          \* y <- get s
          \* x == y
          RaceFree ==
            /\ [][(t1 = "Put" /\ t1' = "Get") => ioRef' = "t1"]_<<t1, ioRef>>
            /\ [][(t2 = "Put" /\ t2' = "Get") => ioRef' = "t2"]_<<t2, ioRef>>
          
          \* There is some concurrency here since we are launching threads,
          \* but after `Init`, it is no longer the case. Whenever `t1` changes, it's
          \* state, `t2` can only be `Await` and vice versa. Our condition is based on
          \* `t2` state, since its initial execution is trivial, it can only `Await`,
          \* while `t1` can start computations immediately.
          AlmostConcurrentFree ==
            /\ [][(t1 /= t1' /\ t2 /= "Init") => t2 = "Await"]_<<t1,t2>>
            /\ [][(t2 /= t2' /\ t2 /= "Init") => t1 = "Await"]_<<t1,t2>>

Now, the question is does my specification match your implementation. I think so, but can’t prove that. It’s better than nothing, though!

ymdfield · May 18, 2025, 4:06am

This is wonderful! This is exactly the kind of methodology I was looking for. The smaller the core, the more practical such a methodology becomes. Thank you.

tomjaguarpaw · May 18, 2025, 7:13am

Great, thanks! What does “almost, only initially” mean here?

jeukshi · May 18, 2025, 12:10pm

Ideally, we would want [][(t1 /= t1') => t2 = "Await"]_<<t1,t2>> to hold (and vice versa for the other thread). It means: always ([] means always, obviously), if t1 changes then t2 is Await (for completeness: _<<t1,t2>> means that the universe might do something else instead, and we allow that). But TLA is displeased:

Action property line 194, col 6 to line 194, col 46 of module connectCoroutines is violated.

State (num = 1)
 ioRef |-> "t1",
 t1 |-> "Init",
 t1mvar |-> "Empty",
 t2 |-> "Init",
 t2mvar |-> "Empty"

State (num = 2)
 ioRef |-> "t1",
 t1 |-> "Run", <---
 t1mvar |-> "Empty",
 t2 |-> "Init",
 t2mvar |-> "Empty"

t1 is doing stuff, but t2 is not even born yet! If we switch threads in our equation ([][(t2 /= t2') => t1 = "Await"]_<<t1,t2>>), TLA claims “skill issue”:

Action property line 194, col 6 to line 194, col 46 of module connectCoroutines is violated.

State (num = 1)
 ioRef |-> "t1",
 t1 |-> "Init",
 t1mvar |-> "Empty",
 t2 |-> "Init",
 t2mvar |-> "Empty"
 
State (num = 2)
 ioRef |-> "t1",
 t1 |-> "Init",
 t1mvar |-> "Empty",
 t2 |-> "Await", <---
 t2mvar |-> "Empty"

t2 is running, but t1 is nowhere to be found.

What does that mean? It means that our model is not deterministic in this regard, both of those situations can happen (so there is concurrent execution). I think this is faithful to the implementation, as race t1 t2 does not guarantee which t will start first.

I have convinced myself that this is fine, as we insist that [][t2 = "Init" => t2' = "Await"]_t2, so t2 can’t do anything interesting anyway (especially, execute user code and mess with our State effect). But we have to settle for [][(t1 /= t1' /\ t2 /= "Init") => t2 = "Await"]_<<t1,t2>>, which I guess can be read as: apart from how the threads initialize (t2 /= "Init"), it is always true that t2 must be Await, when t1 changes its state.

ymdfield · May 19, 2025, 7:41am

I’ve made a provisional revision of the article.

In particular, Part 1.3 has been significantly revised based on the discussions here, with efforts made to ensure that the arguments are as clear and unambiguous as possible:

@tomjaguarpaw and everyone, I realize the volume has increased considerably, and I apologize for the extra work this may cause, but if possible, I’d appreciate it if you could check whether the discussions here have been accurately reflected. If everything looks good, I’d like to consider this the final revised version.

tomjaguarpaw · May 19, 2025, 8:14am

From the perspective of algebraic effects, there are both libraries that do not support delimited continuations (effectful, bluefin, cleff, etc.) and those that do (speff, etc.), where the former are practical, but the latter are still experimental.

You could also say that bluefin-algae supports delimited continuations (although I believe it is in an experimental stage).

Functions with signatures that cannot be implemented in the current type-safe version of heftia can be implemented in the ReaderT IO version. When you run them, runtime errors occur. To prevent that, you need to think through when runtime errors will occur and restrict the interface accordingly.

I think this is good evidence for using Heftia as a “correct by construction” reference implementation, not good evidence for eschewing IO-wrapper effect systems. After all, once you have a "known good " implementation, why not use it as the basis for an alternative design that’s faster?

StateSource can be used to implement unsafeCoerce

phantom type role on State breaks soundness

I don’t think it’s fair to mention those next to “unsafeCoerce derivable from Coroutine+locally+abort”. The former are trivial oversights with single-line fixes and no impact on user code. The latter is a fundamental design flaw.

performance issues have the advantage that even if discovered later, there is room for improvement without changing the interface

So do safety issues, assuming the interface is designed correctly. I don’t have a proof, but I’m pretty confident that Bluefin’s interface is designed correctly. One reason I’m confident is that Bluefin has a pure implementation. (Actually the pure implementation is less safe than the IO-wrapper implementation, because both use incoherent instances but the latter’s instances contain no run time data but the former’s do.)

“the Haskell way” offers a superior approach to avoiding interface holes

I agree with this, but I think it’s short-sighted to declare that it’s only ADTs that lay the path on “The Haskell Way”. For example, we all act with confidence that Haskell’s ADTs support exhaustively-checked patterns. Yet the pattern matching implementation is ultimately written in C, an unsafe language. That’s OK though! It’s fine to take some things as “axiomatically safe”. But I think we should pay due respect to the idea that less-familiar things can also be “axiomatically safe”.

For example, consider Bluefin issue #57. It proposes defining a small trusted API for ID:

type role ID nominal
newtype   ID (a :: k) = ID Unique

newID :: IO (ID a)
newID = coerce newUnique

sameID :: forall a b. ID a -> ID b -> Maybe (a :~~: b)
sameID (ID id1) (ID id2)
  | id1 == id2 = Just (unsafeCoerce (HRefl @a))
  | otherwise  = Nothing

Once we have this small “axiomatically safe” API, Bluefin’s exception handling can be implemented completely safely.

I don’t think we should privilege constructs built on top of ADTs as somehow “more safe” as constructs built on top of other safe primitives.

hellwolf · May 19, 2025, 8:42am

I thoroughly enjoy this thread and learning a ton. Thanks for keeping up such high-quality, engaging discussions.

If I may jump in briefly, as a bystander, to state one observation:

“the Haskell way” offers a superior approach to avoiding interface holes

I wonder if the community values this title of “the Haskell way” enough to be worth fighting for. We are probably all better off not having such a frivolous title and continuing to enjoy such a show only from a technical perspective.

So, perhaps explaining two different approaches, e.g., “the category-theoretic and correctness-by-construction approach” vs. “Safe API with trusted-core approach,” suffices.

People can make their own choices. (But I guess many libraries here have the impetus of wanting more users to buy in their approaches. So, what’s going on is fair enough.)

tomjaguarpaw · May 19, 2025, 8:53am

I agree that it is less distracting to focus on technical facts rather than preferences. Still, the point I am trying to make is that the difference between Bluefin and Heftia in this regard is only a practical matter of as-yet unimplemented safe primitives, not a difference in principle.

ymdfield · May 19, 2025, 8:57am

Thank you for your thorough review! Understood.

I’ll add bluefin-algae to the enumeration.

I agree that it’s not entirely fair, but it does serve as a concrete example of the point being made. So I’m thinking of adding a note clarifying that the latter stems from design, whereas the former results from a simple oversights.

What I want to argue here isn’t merely the use of ADTs, but that they can be encoded directly in a way that clearly corresponds to the categorical objects in the theory. This aligns with the earlier sentences, where I emphasized that the category-theoretic background is crucial for the predictability of the semantics and type-hole safety. I’d like to include a sentence that emphasizes that point. It may also be possible with an axiomatic approach, but I’m not very familiar with that.

Since you’re likely more versed in the axiomatic approach than I am, perhaps the best solution is for you to write an article on it, and I’ll link to it from Part 1.3.

Topic		Replies	Views
[ANN] heftia-effects: higher-order algebraic effects done right Announcements	11	1603	October 13, 2024
[ANN] effectful - an easy to use, performant extensible effects library Announcements	0	884	July 15, 2022
Abstracting storage details with Effectful: first blog post Links	1	231	May 29, 2025
Effectful-2.3.0.0 is now available Announcements	3	914	September 14, 2023
[ANN] cleff - fast and concise extensible effects Announcements	0	699	January 31, 2022

[ANN] A series of articles on Heftia: The Next Generation of Haskell Effects Management

Related topics