Whence monad transformers? Where are distributive laws?

kindaro · March 25, 2023, 5:43am

example

Say I have a dynamical process f: α → α. I can iterate it to my heart’s happiness: f ∘ f: α → α, f ∘ f ∘ f: α → α and so on. Soon I should want to add some effects to it, like say non-determinism with ndf: α → [α] or a progress bar with iof: α → IO α. So far so good, ndf <=< ndf: α → [α] and iof <=< iof: α → IO α. Hey, why not both? Welcome iondf: α → IO [α] and ndiof: α → [IO α]. There is nothing special about [ ] too, I can instead ask for a random outcome with some kind of a random monad, or even say «any monad will do». But now there is a bunch of questions before me.

questions

Are IO [α] and [IO α] in principle the same? We can mediate one way with sequence. We cannot mediate the other way, can we? There must be some information sequence loses? So what is the difference, what is this lost information?
Do we have a monad again?
- We have an inlay η = pure @ IO ∘ pure @[ ]: α → IO [α].
- We have a map map = fmap @ IO ∘ fmap @[ ]: (α → β) → IO [α] → IO [β] .
- But is there a function μ = join: IO [IO [α]] → IO [α] such that μ ∘ η = id, μ ∘ map η = id and μ ∘ μ = μ ∘ map μ?
  - The obvious candidate is fmap μ ∘ μ ∘ fmap sequence. We only have to check the equations, right? So how do we check the equations?
  - For the other composition, [IO α], this definition is not going to work, because IO is not traversable. So, if there is a natural function that works for any monads, ours is not it. (It turns out there is no natural function because some monads do not compose.)
What is IO, anyway? How do we know there is a monad IO to begin with? There are some pure specifications of something that looks like IO (say this one by Wouter Swierstra and Thorsten Altenkirch), but how can we know they share properties with the real IO thing?

transformers

The standard solution is not to do this at all. A programmer is expected to wield something called monad transformers (offered by Sheng Liang, Paul Hudak and Mark Jones) instead. It seems to be a safety feature. For example, monad transformers allow you to build a monad of type input → Either left right, but not Either left (input → right) — it so happens the latter is not a monad (as offered by Petr Pudlák). We are expected to trust that whoever wrote the monad transformers has divine insight into the laws of the Haskell universe. It is not ever justified anywhere by anyone why we should have monad transformers when we could just compose (many, though not all) monads, is it? (Not that the justification is ever asked for either.) Why EitherT either (Reader reader) and not simply Compose (Either either) (Reader reader)? Who is this EitherT guy anyway?

composition

Curiously, Mark P. Jones and Luc Duponcheel also outlined the conditions needed for two monads to compose right away, without any transformer fuss. You only need a polymorphic function ∀α. t (m α) → m (t α), such as sequence, that also obeys the laws S₂–S₄ on page 9 of the article; the law S₁ is fulfilled by parametricity. This looks like something that can be encoded as a class of kind (★ → ★) → (★ → ★) → Constraint.

I reckon Mark and Luc were not aware of earlier work by Jon Beck, Distributive Laws, in Lecture Notes in Mathematics, No. 80, Springer-Verlag, 1969. There he offers that a «distributive law» should be a natural transformation tm → mt fulfilling 4 equations. Two of these equations are S₂ and S₃; S₄ can be shown from the other two. Jon goes on to show that his distributive law is enough to prove monad laws for the composition mt. So, Mark and Luc independently arrive to almost the same result as Jon.

transformers versus composition

Monad composition turns out to be a well known thing. Kitteh can haz! But it only works on a case by case basis: for every two monads, we need a special sequence that fits the definition of distributive law. I have found examples of sequence in Haskell that break S₂. It also seems that laws of traversable and monad together are not enough to prove S₂–S₄. Long tale short, there is no ground to think that traversable affords us a distributive law.

At the same time, monad transformers are seemingly parametric. A given monad transformer can transform any monad. So, if, say, the monad transformer ReaderT is the same as composition with the monad Reader (on whichever side), it must imply a distributive law between Reader and any monad (on the fitting side). They know something we do not!

so

So I made the case that there is ground for questions. I have no answers. I have not found any literature that connects IO to monads, monads to monad transformers, monad transformers to distributive laws.

All the counterexamples to composition of a monad with a traversable monad that I have found so far are with the trivial Const ( ) monad and its sequence as the (broken) distributive law. Then again, I only checked the simplest monads: Reader, Writer, Maybe. I am also suspicious about Compose [ ] [ ] because it hangs. Also, it is known thanks to Bartek Klin and Julian Salamanca that the twin co-variant power set functor (one of a few such functors known to mankind) is not a monad, and [ ] is kind of alike to a co-variant power set functor. Then again, could be a fault in my code. Anyway, where are other, more interesting counterexamples to composition of a monad and a traversable monad?

Lastly: if many monads compose just fine, why not have a class for distributive laws?

jaror · March 25, 2023, 8:53am

The first thing you can do is to try to find a counterexample; I’ve found this one:

import Control.Monad
newtype IOL a = IOL { unIOL :: IO [a] }

instance Functor IOL where
  fmap f = IOL . fmap (fmap f) . unIOL
instance Applicative IOL where
  pure = IOL . pure . pure
  (<*>) = ap
instance Monad IOL where
  IOL m >>= k = IOL (m >>= fmap join . traverse (unIOL . k))

liftIO :: IO a -> IOL a
liftIO = IOL . fmap pure

liftL :: [a] -> IOL a
liftL = IOL . pure

ex1 = void . unIOL $
  liftL [putStrLn "hello", putStrLn "world"] >>= \x ->
  liftIO x >>
  liftL [putStrLn "goodbye", putStrLn "world"] >>= \y ->
  liftIO y

ex2 = void . unIOL $
  (liftL [putStrLn "hello", putStrLn "world"] >>= \x ->
  liftIO x) >>
  liftL [putStrLn "goodbye", putStrLn "world"] >>= \y ->
  liftIO y

main = ex1 >> putStrLn "" >> ex2

According to the associativity monad law, ex1 and ex2 should be the same, but the result of running this program shows that they are different:

hello
goodbye
world
world
goodbye
world

hello
world
goodbye
world
goodbye
world

kindaro · March 25, 2023, 12:41pm

Yep, same thing with Writer and [ ] — they break S₄. Since IO works as a writer here, it is also broken. I did not find this because my Writer example was with a commutative monoid, which makes it a commutative monad. If IO sorted its output, it would also be somewhat like a commutative monad, and it would be at least somewhat more lawful.

Here is my monad checking set up, by the way. (The package is unrelated, I hijacked it because I was lazy to create a new package.)

But I do not feel too good about your example because of this: you can easily write a program that, when run two times, writes out lines in a different order. (You can use a random thread delay, or you can simply spawn two threads and have them race. I remember doing that some time ago.) So if we accept the legitimacy of this testing method, we have to concede that x ≠ x: IO ( ) for some programs x. What I am saying is that we need first to sort out what IO is, what its equality is.

As this example shows, I have altogether overlooked the distinction between commutative and non-commutative monads. Looking into it now, I am finding more research that has something to say about commutativity and composition together.

kindaro · March 25, 2023, 3:01pm

Just found this article — Monads and Effects by Nick Benton, John Hughes and Eugenio Moggi — that seems to be the first source for monad transformers. (Dated 2002.) Hopefully it will answer some of my questions.
Also, this article — Layer by Layer – Combining Monads by Fredrik Dahlqvist, Louis Parlant and Alexandra Silva — offers sufficient conditions for there to be a distributive law.
Also, this article — No-Go Theorems for Distributive Laws by Maaike Zwart and Dan Marsden — gives conditions under which monads do not compose.

Between the negative results of Maaike and Dan, and the positive results of Fredrik, Louis and Alexandra, how many monads could remain for which we cannot quickly decide either way?

I have not looked into details, this will take time.

daylily · April 8, 2023, 1:58pm

copied from fp discord:

the question about monad composition is a very interesting one. Nicholas Wu wrote a paper Effect Handlers In Scope that does something similar to that and it turns out the semantics are not good (perhaps because he didn’t follow the distributive laws). people have invented things like algebraic effects that (partially) solves the monad composition (or rather effect composition) problem. i think the main problem to a distributive property class is that many useful monads won’t fit into it… but that’s just a guess.

edit: Nicholas’ paper did state some distributive laws so maybe it’s just the implementations that based on this paper (polysemy, fused-effects, etc) that did not follow the laws entirely behave undesirably in certain cases. I did not try to formally prove anything in the paper so take this with a grain of salt…

kindaro · June 25, 2023, 7:40am

Another article Composing Monads Using Coproducts by Christoph Lüth and Neil Ghani offers another way to put monads together: sum in the category of monadic functors and natural transformations that commute with join and pure. It is somewhat technical, and I have never seen anything like this put to work. I should look into it more carefully some time.

As section 3.4 and theorem 3.1 say, a monad transformer arises from a sum — curious!

atravers · June 25, 2023, 10:43am

Semantics of fixIO by Levent Erkok, John Launchbury and Andrew Moran.
Tackling the Awkward Squad: … by Simon Peyton Jones (from section 3.2 on page 20 onwards).

duplode · June 25, 2023, 1:38pm

Another way of motivating the distributive laws is deriving them by breaking down the composed monads (that is, those which arise by mere functor composition) in terms of either the Kleisli or the Eilenberg-Moore adjunctions. I have discussed that in Do monad transformers, generally speaking, arise out of adjunctions?, a Stack Overflow Q&A.

kindaro · June 25, 2023, 4:44pm

Cool!

I wish you wrote that in mathematical notation instead of suggestive Haskell notation. I recall Stack Overflow does not support Latex…

duplode · June 25, 2023, 5:24pm

Side note: I’ve just noticed the Proxy/Const () counterexample you mention at the opening post demonstrates a violation of law # 2 for the compose-on-the-inside, Traversable-based transformers. I’ve updated the SO answer to mention that.

kindaro · June 18, 2024, 4:25pm

summary of the laws
required for the composition of monads to be a monad

notation

Capital letters like F and N stand for functors. In Haskell they are usually represented as type constructors with a Functor instance.
- Application of a functor F to an object X is denoted as function application FX, or F X in Haskell.
- Application of a functor F to an arrow f is denoted as function application Ff, but in Haskell we write fmap @F f.
- The exponential notation Y^X and the arrow notation X → Y denote the arrow functor of X, applied here to Y. The application of this functor to functions is denoted f^X or fmap @((→) X) f. (Sadly, the superscript notation is also used for other things, like say monad operations, so confusion is possible — beware.)
Greek letters like τ: F → G stand for natural transformations. They are often equipped with a subscript which denotes specialization to a specific object, like so: τ_X: FX → GX. In Haskell they are usually represented as polymorphous functions, like τ ∷ ∀ α. F α → G α.
The circle ∘ denotes composition of all kinds of arrows. We often neglect to write it, as confusing as this may be.
- g ∘ f is the composition of function g after f. We often simply write gf. In Haskell we write g . f .
- G ∘ F is the composition of functor G after F. We often simply write GF. In Haskell we write Compose G F after import Data.Functor.Compose.

definitions

The function η^M is the unit of a monad M. In Haskell we write pure @M.
The function µ^M is the join of a monad M. In Haskell we write join @M after import Control.Monad.

result

Let l: NM → MN be the distributive law of monads M and N, written in Haskell as swap ∷ ∀ α. N (M α) → M (N α). Then we have MN a monad with η^MN = η^M ∘ η^N and µ^MN = Mμ^N ∘ µ^M ∘ Ml: MNMN → MN so long as the following conditions are satisfied:

∀f. l ∘ NMf = MNf ∘ l — in Haskell this is ensured by parametricity.
l ∘ η^N = Mη^N: M → MN
l ∘ Nη^M = η^M: N → MN
There are two options. Either implies the other one (according to me).
- after Mark Jones and Luc Duponcheel
  
  p ∘ Nd = d ∘ p: NMNM → MN where d = µ^M ∘ Ml: MNM → MN and p = Mµ^N ∘ l: NMN → MN
- after Jon Beck
  1. µ^M ∘ Ml ∘ l = l ∘ Nµ^M: NMM → MN
  2. Mµ^N ∘ l ∘ Nl = l ∘ µ^N: NNM → MN

example

Let X and Y be some types. Then we have M α = α^X and N α = α^Y, written in Haskell as type constructors ((→) X) and ((→) Y) ∷ ★ → ★. We prove that they are functors and monads and that their composition is also a monad.

These monads are instances of the familiar Reader monad, and we should expect their composition to be a Reader monad as well, since we have ⟨curry; uncurry⟩: X → Y → α ⇄ X × Y → α. The distributive law in this case is flip.

All proofs are done by translation to λ calculus, βη reduction and α equivalence. Some of this work can be automated with tools like https://lambdacalc.io/, it takes a while if done by hand.

they are functors

Define identity like so: id ≔ λ x. x. Define composition like so: ∘ ≔ λ g f x. g (f x). Define f^X ≔ λ g. λ f. g ∘ f. Check that (h ∘ g) ∘ f = h ∘ (g ∘ f). Check that id^X = id. Check that (g ∘ f)^X = g^X ∘ f^X. Done!

they are monads

Define η ≔ λ a b. a. Define µ ≔ λ f x. f x x. Check that µ ∘ η = id, µ ∘ η^X = id, µ ∘ µ = µ ∘ µ^X. Done!

their composition is a monad

Define l ≔ λ f x y. f y x. Check the laws outlined above. Done!

atravers · April 25, 2025, 12:24am

Non-deterministic:

[…] I/O is non-deterministic, time-consuming, idiosyncratic, and error-prone, while pure functions embody the antithesis of this.

Turn that code upside down :) | Andrew T. Biehl
You can’t predict what the user is going to choose, making the decision non-deterministic, and hence impure.

input - Getting I/O in a functional program - Stack Overflow
Supporting IO would require us to extend our heap model to include relevant aspects of the outside system, plus some kind of nondeterminism for IO actions to take into account that we can never model the world in its entirety.

Imperative Functional Programming with Isabelle/HOL (page 15 of 16).
There are at least two ways in which nondeterminism arises: one, the action of an
external agent, e.g., the human dealing with email. Here a deterministic program is
modelled by a nondeterministic operator because it is a useful simplification. Also this
illustrates how finite approximation need not be recursive: it could be given by human
interaction. This will be our leading example of nondeterminism through the course
of the paper. Second, from concurrency.

A representable approach to finite nondeterminism (page 2 of 23).

So any useful notion of equality for IO a (if one exists at all) must cope with the fundamental fact that reality isn’t always predictable.

olf · April 28, 2025, 11:12pm

Some remarks:

Remark 0: Trying to build an mtl-style library on adjunctions quickly leads into a hell of overlapping instances and ambiguous types. Despite the instance definitions in mtl growing quadratically with the number of transformers, it seems to be a sweet spot for practical purposes.

Remark 1: Some monad transformers are explicitly stated to yield lawful monads only when applied to commutative monads. Hence it is not surprising to find counterexamples with non-commutative monads such as [] or IO.

Remark 2: I found the formulation of a Traversable-style monad transformer as a functor on Kleisli categories interesting: The fmap of a type constructor t on Kleisli m has the same type signature as traverse:

(a -> m b) -> t a -> m (t b)

With that one can construct the composite Kleisli lifting:

liftFromFmap :: (Monad m, Monad t) => 
    (forall x y. (x -> m y) -> t x -> m (t y)) -> 
    (    a  -> m (t b)) -> 
    m (t a) -> m (t b)
liftFromFmap fmapKleisli k = fmap join . (=<<) (fmapKleisli k)

Remark 3: I tried to realize in Haskell the counterexample that Klin and Salamanca gave for the double powerset monad of Manes and Mulry. Either I must have made a mistake implementing the defunct Kleisli lifting of Manes or the mistake that Klin and Salamanca found is actually not a mistake.

import Data.Set (Set)
import qualified Data.Set as Set

-- Choice operator.
-- Note: computes choice relations, not choice functions
ch :: (Foldable set, Ord a) => set (Set a) -> Set (Set a)
ch = let
    -- non-empty powerset
    nep :: Set a -> Set (Set a)
    nep = Set.filter (not. Set.null) . Set.powerSet
    -- every possible way to combine the elements of the two sets
    liftSemi :: (Semigroup p, Ord p) => Set p -> Set p -> Set p
    liftSemi xs ys = foldMap (\x -> Set.map (x <>) ys) xs
    in foldl
        (\cs a -> liftSemi cs (nep a))
        (Set.singleton (Set.empty)) -- this is @ch Set.empty@

-- Kleisli lifting of double powerset 
-- given by Manes and Mulry (I think)
manes f = foldMap (Set.map Set.unions . ch . Set.map f)

Why do I think there might be a mistake in the mistake? There are categories (e.g. domains, locales) with two powerset-like monads S and H (Smyth and Hoare powerspace, used to model angelic and demonic non-determinism) so that

SH and HS are both monads and naturally isomorphic, the isomorphism realized via choice of finite sets,
For every object x, SH x is isomorphic to the double contravariant power of x.
We could therefore with some justification say that, not neglecting partial functions, the type (x -> ()) -> () is the type of closed sets of compact sets of x, or the type of compact sets of closed sets of x.

Remark 4:

I read that paper some years ago and as far as I remember it is a kind of free GADT-like construction. Should not be hard to implement in Haskell. Perhaps that is what the effect folks do.

Topic		Replies	Views
Why are there so many libraries for algebraic effects? Learn	57	1746	April 20, 2025
Monad instance trouble: Any way to implement unionWith on some kind of map type, without an Ord constraint? Learn	25	875	December 21, 2024
A new kind of Continuation-like Monad? Learn	22	2084	November 12, 2024
Using unsafePerformIO safely?	79	7931	January 2, 2023
Dialogues vs continuations (and algebraic effects) to implement I/O	18	951	December 10, 2024

Whence monad transformers? Where are distributive laws?

example

questions

transformers

composition

transformers versus composition

so

summary of the lawsrequired for the composition of monads to be a monad

notation

definitions

result

example

they are functors

they are monads

their composition is a monad

Related topics

summary of the laws
required for the composition of monads to be a monad