[ANN] A series of articles on Heftia: The Next Generation of Haskell Effects Management

First, thank you for reading the article carefully and providing objective and well-founded feedback. Thanks to you, I feel I can now see the issue from a broader perspective.

Let me begin by responding to the point I most want to discuss at the moment. I apologize that this response has become rather long and may not be well organized, but I would appreciate it if you could review it again to see whether it can withstand objective critique.

These statements are based on my experience when I attempted to improve the performance of my library by forking an existing IO-wrapper implementation. (Incidentally, while I succeeded in making it work correctly, I failed to achieve any speed improvements.)

What I encountered at that time was that even with types, runtime errors and segfaults could occur. This was something I had not experienced in either my library or the previously type-protected Haskell programming, and it was a harsh experience that reminded me of my days working in C.

A typical bug was runtime access to an uninitialized handler in the evidence vector. This occurs, for example, when runState and runReader are composed in a particular order. In the type-safe version of my library, it is simply impossible to compose them in that order (although the matter is a bit more complicated. It is not merely about the order of runState and runReader, but about the compatibility between higher-order effects and delimited continuations). Any such operation always triggers a type error and cannot even be written. In other words, the IO-wrapper approach, in its default state, allows operations that are essentially wrong, and to prevent this, one must retroactively guarantee interface type safety by isolating and hiding unsafe modules. (This corresponds to the reverse of making invalid states unrepresentable.)

Thus, there are two contrasting development processes here:

  1. Starting with complete safety guarantees that are too strict to be practical and gradually relaxing them (while still remaining safe as long as typing is preserved).
  2. Starting with an interface that may have safety gaps and filling them in as they are discovered.

I do not deny the possibility that this is an overgeneralization of my own experience, but fundamentally, IO-wrapper libraries tend to fall under category 2. Of course, my library also contains some elements of 2—such as open union or certain functions that are not sufficiently generalized—but in terms of the sheer number of cases and their locality, I would say it is small.

During the development of bluefin and effectful, I believe there were several occasions when such safety holes in the interface were discovered and then patched. This includes not only issues recorded in issue trackers but also minor fixes applied immediately when tests.

For example, aside from runtime errors, there may have been cases where IORef combined with certain interpreter mechanisms or concurrency exhibited unintelligible behavior. If you have indeed encountered almost none of these issues, I would like to know how you managed to prevent them. I do not know how to prevent such issues in advance. My understanding is that they can only be dealt with reactively after they occur.

(Moreover, I am particularly interested in looking further back to the period when people were experimenting with whether UnliftIO and delimited continuations could coexist.)

In other words, what I’m ultimately trying to say is that there are two design philosophies here, each holding that:

  1. If there is a possibility of bugs, you should eliminate that possibility. Rather, you should demonstrate that there is no possibility of such bugs.
  2. If you say there is a possibility of bugs, you should show the basis for that claim.

As for this framework, I’m stepping back and don’t know how one should think about it at that moment. However, I believe this is simply a difference in philosophy, and it is not a matter of one being right and the other wrong. What do you think?

I had a misunderstanding about this. I did not distinguish between issues recorded in the issue tracker and those discovered and fixed on the spot through implementation and testing. In my next article, I will correct this and revisit the outcomes of our discussion here.

I understand this, but I feel it overlooks the distinction between the level of technical elements and the protocols that connect them.

Isn’t this specifically a guarantee regarding resource safety, rather than a guarantee of the safety of the entire effect system?

It cannot be denied that it may be used as a buzzword in common parlance, but there is at least a clear definition that people ought to rely on, and I adhere to it. This is what is presented in Plotkin’s work and in the literature on the eff language. I intend to write about it in an article in due course.

That is helpful. Thank you.

Ah, I see! So you are talking about your experience as an implementor of an IO-wrapper effect system, not the experience of a user. Indeed, the experience of J Launchbury and SPJ when they implemented ST was that they had to be careful to get the primitives correct. But once the primitives were correct the experience of the user was free from type unsafety issues. Naturally it’s better if the implementation can be guided by types, as well as the end use. Nonetheless, Bluefin’s implementation is far simpler than effectful’s so I think it’s easier to convince oneself that it’s correct.

I think this story would be far more valuable than an unsupported claim that IO-based effect systems are less type safe in some unexplained way.

I don’t see it as the opposite. I see it as another form of the same: the analytic approach to making invalid states unrepresentable (correct by careful inspection), as opposed to the synthetic approach (correct by construction). Doubtless the latter is better all other things being equal, but I am not yet convinced that all other things are equal.

I would say it’s small in Bluefin too (and there are ways it could be smaller yet). Have you looked?

It’s suprising to me that you’re willing to believe this without being able to provide evidence. However, you are actually correct! These two issues were reported to me recently, and fixed. They are not design flaws. They are implementation oversights. No churn of the implementation is required, and end user programs were not affected by the fix. Still, I take your point that it is better if Haskell’s type system can help. I suppose it could if I took more care to use less unsafe stuff in the implemention of Bluefin and developed a small “trusted core”, but it really doesn’t seem to be worthwhile at the moment. The risks of something going wrong in that regard are minimal.

I can confirm that no such holes have been found that are not recorded in the issue tracker.

I have not experienced such issues. I don’t know why you think IORef combined with “interpreter mechanisms” could pose a problem. Perhaps I’m just not seeing it because Bluefin doesn’t have an “interpreter mechanism”? Regarding concurrency, yes, Bluefin gets away with it by not yet having a native concurrency story. But it will continue to get away with it by having well-designed concurrency primitives that don’t cause such problems. See https://github.com/tomjaguarpaw/bluefin/issues/34 for up-to-date discussion on concurrency on Bluefin.

I certainly believe that the synthetic approach is better when it is sufficient, but I don’t yet see that it is sufficient. For example, I don’t yet see how one can use MonadUnliftIO to bracket IO operations in Heftia as one would want.

No, I don’t think so, unless you’re using a different definition of “safety of the entire effect system” to me. It’s a guarantee that ST contains no type holes. What do you mean by it?

Very well, I am content to remain ignorant for the time being.

You’re welcome!

1 Like

My intuition has always been that algebraic effects is a system where arbitrary effects can be combined into new effects. In this sense, foo :: Log :> es, DB :> foo doesn’t have two effects, but one anonymous effect arising from this combination. Granted, this is how I’d do a quick explanation of what relational algebra is about: combining relations into new relations. Hope is not much more complicated than that.

But that aside, what I’m interested in is how Heftia deals with concurrency. This is where those effect systems that upload state into the cloud (monad) usually struggle. If you could do a chapter on that, I’d love to read it.

1 Like

I would like to focus on this point. That is, on how to guarantee that the primitives are correct.

In the IO-wrapper approach, guaranteeing correctness, as far as I know, requires a large amount of work and presents challenges in terms of maintainability. Specifically, you must set the axioms of the IO monad in accordance with the GHC specification and perform the same kind of formal verification as for a typical procedural language. It is very elaborate, and the benefits do not justify the cost, so in reality it will not be carried out.

On the other hand, with the algebraic data type method, the proof basically consists of transforming types and functions faccording to a few rules, so it does not require much mental effort. Once you get used to it, you can readily prove correctness.

That said, even without formal guarantees, writing a sufficient number of tests might make practical correctness virtually certain. However, I do not know whether that would be a good choice in the long term.

Moreover, at present, the specification that each effect library adheres to, that is, the criteria for what behavior can be considered correct, is not clear. It seems to allow room for subtle behaviors. In my methodology, the theoretical foundation is at least clear, and I can provide an answer when asked to explain behavior or validity.

With regard to this, it is based on my experience implementing the IO-wrapper approach
and on judgments formed from seeing several issues. However, those might have been with libraries other than effectful or bluefin:

https://github.com/re-xyr/cleff/issues/15
https://github.com/hasura/eff/issues/13

I have also confirmed segfault with eveff and mpeff, although they have not been reported.

I would appreciate it if you could share your reasons for thinking that. To me, this does not appear minimal, but rather a relatively imminent risk. Even if I take the stance that delimited continuations of algebraic effects are unnecessary, I cannot help but consider the possibility that people will build an ecosystem relying on concurrency interfaces that is potentially unsafeCoerceable. This is not about bluefin, but about ReaderT IO in general. If you have any ideas on how this risk could be kept sufficiently small or on any way to recover without issues should it materialize, I would be grateful if you could share them.

Regarding the interpreter mechanisms, it may not apply to bluefin. polysemy or effectful might be closer. When I implemented an IO-based interpreter related to concurrency, I felt, based on that experience, that it was difficult to track how IORef propagates where when multiple effect interpreters interact in a complex way.

By that I mean the absence of runtime errors and semantic correctness, that is, the theoretical guarantee that the effect and interpreter will not produce any problematic results.

1 Like

I’d like to write about the concurrency. Thank you! However, it might differ slightly from your expectations in the sense that it doesn’t use the standard concurrency primitives like MVar or STM, but instead has a more strictly typed interface. Because of that, it may feel somewhat restrictive in terms of flexibility.

1 Like

Maybe. I’m not sure it’s too elaborate. There are only really two primitive effects in Bluefin: State and Exception. State has already been proved safe in the ST paper. Extending it to Exception doesn’t seem too challenging. Beyond that there would have to be some work to prove the Bluefin’s use of threads is valid when it comes to connectCoroutines, but I also guess that’s not much. Still, I accept that that is work requiring type theory expertise, and knowledge of how GHC works. Yet you are using work that required category theory expertise, so work requiring expertise can’t be a blocker. Furthermore, you are using GHC, which is implemented using IORefs and exceptions, so you’re already accepting, at least implicitly, that they work how users think they do.

Still, when the theoretical work is complete there is still the practical work of implementation, and that is harder in the analytic case because the type system is less helpful. Ideally one would come up with a minimal “trusted core”. I put my commits where my mouth is and spent some time reducing the Bluefin “trusted core”. You can see the results at #55. I suspect it can be reduced further.

But yes, you’re right: it’s better to always be able to rely on the type system. But that is not the only concern. In Haskell there are the risks of space leaks and unexpected slowdowns due to closures having structure other than we expect. That’s one of the big benefits of the IO-wrapper approach. The operational semantics are much clearer.

I believe this to be true of Bluefin also. Try me!

Ah, thank you for sharing! I think your post would have been a lot stronger if you had included links to these issues. Effectful has a similar issue (prevented at run time): https://github.com/tomjaguarpaw/bluefin/issues/15#issuecomment-2098343722. Bluefin is immune, however, due to its ST-style type system, so please don’t make the claim that all IO-wrapper effect systems suffer this problem.

The risk is minimal for State and Exception since I’m assuming their safety is already proven by the ST paper, or a simple extension. Of course I could be wrong, since the proof does not exist yet, but I consider this very unlikely. For more sophisticated scenarios such as connectCoroutines I guess the analysis must be correspondingly more sophisticated, but I don’t see anything to be concerned about. Regarding concurrency, well, Bluefin does not have native concurrency, and I will not add any until I am convinced there is a safe API. Regarding concurrency and ReaderT IO I’m not sure I really follow. Are you really saying you think there are type safety holes in the base concurrency and IO story provided by GHC?

Well, OK, understood, but please don’t extend the judgement to all IO-wrapper effect systems.

I still don’t really know what that means.

3 Likes

At least based on the explanation provided, I am somewhat convinced of bluefin’s methodology. Putting aside higher-order effects and concurrency for the moment, it seems that the essential primitives can be understood as operations related to “zero- or one-shot continuations,” which correspond to Exception and State, respectively. If this intuition is correct, it is indeed understandable that a small, explainable core could implement a subset of algebraic effects. In particular, I liked #55.

However, regarding the group of functions such as connectCoroutines, which internally use concurrency directly instead of the trusted core (is my understanding correct here?), it still seems difficult to predict their behavior and guarantee correctness when users combine them in complex ways. In any case, the uncertainty has decreased except the concurrency aspects. Thank you for the explanation!

Okay. Perhaps this is because bluefin does not have the ReaderT part of ReaderT IO…

1 Like

connectCoroutines would be considered part of the trusted core. However, it is not a complex use of coroutines. In particular there is no parallel execution. Execution switches between the threads deterministically at each yield, so I don’t think it would be hard to write down the properties that it satisfies (though somewhat harder to prove them).

Bluefin doesn’t keep around any state in its Eff monad, if that’s what you mean?

1 Like

That’s not quite what I meant, but it’s hard to express clearly… I think I need to take some time to organize my thoughts.

I hope to write an article about it eventually.

Probably, yes, I was referring to the fact that the internal definition of the Eff monad is just IO a.

1 Like

That’s the filesize benchmark actually, which does IO. The “reference” baseline is 2.10ms.

So it’s really 0.08ms vs 1.00ms overhead difference. ~10x slower! Which is shout in line with the “deep” state difference (268us vs 2.40ms).

And heftia’s difference between shallow and deep is notable. Vs effectful which has barely any difference.

So for a boring web app without tight SLAs..who cares those are peanuts..but if I’m managing a 16ms frame time budget in my game, I wouldn’t bother with heftia and stick to effectful (or cleff which is similar).

4 Likes

Oh, I should also say that HandleReader would have to be part of the trusted core, and it’s rather complicated. I don’t fully understand it yet. Also, I did come across (and fix) a type safety violation in an unused and deprecated part of the codebase: Implement internals using fewer directly unsafe features by tomjaguarpaw · Pull Request #55 · tomjaguarpaw/bluefin · GitHub But again, the design is correct so the fix is only to the implementation. Users don’t have to change anything (even though there probably are none for this feature).

1 Like

Here is my proof that it is race free and the threads (almost, only initially) don’t execute concurrently.

Now, the question is does my specification match your implementation. I think so, but can’t prove that. It’s better than nothing, though!

1 Like