Whence monad transformers? Where are distributive laws?

example

Say I have a dynamical process f: α → α. I can iterate it to my heart’s happiness: f ∘ f: α → α, f ∘ f ∘ f: α → α and so on. Soon I should want to add some effects to it, like say non-determinism with ndf: α → [α] or a progress bar with iof: α → IO α. So far so good, ndf <=< ndf: α → [α] and iof <=< iof: α → IO α. Hey, why not both? Welcome iondf: α → IO [α] and ndiof: α → [IO α]. There is nothing special about [ ] too, I can instead ask for a random outcome with some kind of a random monad, or even say «any monad will do». But now there is a bunch of questions before me.

questions

  • Are IO [α] and [IO α] in principle the same? We can mediate one way with sequence. We cannot mediate the other way, can we? There must be some information sequence loses? So what is the difference, what is this lost information?

  • Do we have a monad again?

    • We have an inlay η = pure @ IO ∘ pure @[ ]: α → IO [α].
    • We have a map map = fmap @ IO ∘ fmap @[ ]: (α → β) → IO [α] → IO [β] .
    • But is there a function μ = join: IO [IO [α]] → IO [α] such that μ ∘ η = id, μ ∘ map η = id and μ ∘ μ = μ ∘ map μ?
      • The obvious candidate is fmap μ ∘ μ ∘ fmap sequence. We only have to check the equations, right? So how do we check the equations?
      • For the other composition, [IO α], this definition is not going to work, because IO is not traversable. So, if there is a natural function that works for any monads, ours is not it. (It turns out there is no natural function because some monads do not compose.)
  • What is IO, anyway? How do we know there is a monad IO to begin with? There are some pure specifications of something that looks like IO (say this one by Wouter Swierstra and Thorsten Altenkirch), but how can we know they share properties with the real IO thing?

transformers

The standard solution is not to do this at all. A programmer is expected to wield something called monad transformers (offered by Sheng Liang, Paul Hudak and Mark Jones) instead. It seems to be a safety feature. For example, monad transformers allow you to build a monad of type input → Either left right, but not Either left (input → right) — it so happens the latter is not a monad (as offered by Petr Pudlák). We are expected to trust that whoever wrote the monad transformers has divine insight into the laws of the Haskell universe. It is not ever justified anywhere by anyone why we should have monad transformers when we could just compose (many, though not all) monads, is it? (Not that the justification is ever asked for either.) Why EitherT either (Reader reader) and not simply Compose (Either either) (Reader reader)? Who is this EitherT guy anyway?

composition

Curiously, Mark P. Jones and Luc Duponcheel also outlined the conditions needed for two monads to compose right away, without any transformer fuss. You only need a polymorphic function ∀α. t (m α) → m (t α), such as sequence, that also obeys the laws S₂–S₄ on page 9 of the article; the law S₁ is fulfilled by parametricity. This looks like something that can be encoded as a class of kind (★ → ★) → (★ → ★) → Constraint.

I reckon Mark and Luc were not aware of earlier work by Jon Beck, Distributive Laws, in Lecture Notes in Mathematics, No. 80, Springer-Verlag, 1969. There he offers that a «distributive law» should be a natural transformation tm → mt fulfilling 4 equations. Two of these equations are S₂ and S₃; S₄ can be shown from the other two. Jon goes on to show that his distributive law is enough to prove monad laws for the composition mt. So, Mark and Luc independently arrive to almost the same result as Jon.

transformers versus composition

Monad composition turns out to be a well known thing. Kitteh can haz! But it only works on a case by case basis: for every two monads, we need a special sequence that fits the definition of distributive law. I have found examples of sequence in Haskell that break S₂. It also seems that laws of traversable and monad together are not enough to prove S₂–S₄. Long tale short, there is no ground to think that traversable affords us a distributive law.

At the same time, monad transformers are seemingly parametric. A given monad transformer can transform any monad. So, if, say, the monad transformer ReaderT is the same as composition with the monad Reader (on whichever side), it must imply a distributive law between Reader and any monad (on the fitting side). They know something we do not!

so

So I made the case that there is ground for questions. I have no answers. I have not found any literature that connects IO to monads, monads to monad transformers, monad transformers to distributive laws.

All the counterexamples to composition of a monad with a traversable monad that I have found so far are with the trivial Const ( ) monad and its sequence as the (broken) distributive law. Then again, I only checked the simplest monads: Reader, Writer, Maybe. I am also suspicious about Compose [ ] [ ] because it hangs. Also, it is known thanks to Bartek Klin and Julian Salamanca that the twin co-variant power set functor (one of a few such functors known to mankind) is not a monad, and [ ] is kind of alike to a co-variant power set functor. Then again, could be a fault in my code. Anyway, where are other, more interesting counterexamples to composition of a monad and a traversable monad?

Lastly: if many monads compose just fine, why not have a class for distributive laws?

3 Likes

The first thing you can do is to try to find a counterexample; I’ve found this one:

import Control.Monad
newtype IOL a = IOL { unIOL :: IO [a] }

instance Functor IOL where
  fmap f = IOL . fmap (fmap f) . unIOL
instance Applicative IOL where
  pure = IOL . pure . pure
  (<*>) = ap
instance Monad IOL where
  IOL m >>= k = IOL (m >>= fmap join . traverse (unIOL . k))

liftIO :: IO a -> IOL a
liftIO = IOL . fmap pure

liftL :: [a] -> IOL a
liftL = IOL . pure

ex1 = void . unIOL $
  liftL [putStrLn "hello", putStrLn "world"] >>= \x ->
  liftIO x >>
  liftL [putStrLn "goodbye", putStrLn "world"] >>= \y ->
  liftIO y

ex2 = void . unIOL $
  (liftL [putStrLn "hello", putStrLn "world"] >>= \x ->
  liftIO x) >>
  liftL [putStrLn "goodbye", putStrLn "world"] >>= \y ->
  liftIO y

main = ex1 >> putStrLn "" >> ex2

According to the associativity monad law, ex1 and ex2 should be the same, but the result of running this program shows that they are different:

hello
goodbye
world
world
goodbye
world

hello
world
goodbye
world
goodbye
world
3 Likes

Yep, same thing with Writer and [ ] — they break S₄. Since IO works as a writer here, it is also broken. I did not find this because my Writer example was with a commutative monoid, which makes it a commutative monad. If IO sorted its output, it would also be somewhat like a commutative monad, and it would be at least somewhat more lawful.

Here is my monad checking set up, by the way. (The package is unrelated, I hijacked it because I was lazy to create a new package.)

But I do not feel too good about your example because of this: you can easily write a program that, when run two times, writes out lines in a different order. (You can use a random thread delay, or you can simply spawn two threads and have them race. I remember doing that some time ago.) So if we accept the legitimacy of this testing method, we have to concede that x ≠ x: IO ( ) for some programs x. What I am saying is that we need first to sort out what IO is, what its equality is.

As this example shows, I have altogether overlooked the distinction between commutative and non-commutative monads. Looking into it now, I am finding more research that has something to say about commutativity and composition together.

2 Likes
  • Just found this articleMonads and Effects by Nick Benton, John Hughes and Eugenio Moggi — that seems to be the first source for monad transformers. (Dated 2002.) Hopefully it will answer some of my questions.
  • Also, this articleLayer by Layer – Combining Monads by Fredrik Dahlqvist, Louis Parlant and Alexandra Silva — offers sufficient conditions for there to be a distributive law.
  • Also, this articleNo-Go Theorems for Distributive Laws by Maaike Zwart and Dan Marsden — gives conditions under which monads do not compose.

Between the negative results of Maaike and Dan, and the positive results of Fredrik, Louis and Alexandra, how many monads could remain for which we cannot quickly decide either way?

I have not looked into details, this will take time.

1 Like

copied from fp discord:

the question about monad composition is a very interesting one. Nicholas Wu wrote a paper Effect Handlers In Scope that does something similar to that and it turns out the semantics are not good (perhaps because he didn’t follow the distributive laws). people have invented things like algebraic effects that (partially) solves the monad composition (or rather effect composition) problem. i think the main problem to a distributive property class is that many useful monads won’t fit into it… but that’s just a guess.

edit: Nicholas’ paper did state some distributive laws so maybe it’s just the implementations that based on this paper (polysemy, fused-effects, etc) that did not follow the laws entirely behave undesirably in certain cases. I did not try to formally prove anything in the paper so take this with a grain of salt…

4 Likes

Another article Composing Monads Using Coproducts by Christoph Lüth and Neil Ghani offers another way to put monads together: sum in the category of monadic functors and natural transformations that commute with join and pure. It is somewhat technical, and I have never seen anything like this put to work. I should look into it more carefully some time.

As section 3.4 and theorem 3.1 say, a monad transformer arises from a sum — curious!

1 Like


1 Like

Another way of motivating the distributive laws is deriving them by breaking down the composed monads (that is, those which arise by mere functor composition) in terms of either the Kleisli or the Eilenberg-Moore adjunctions. I have discussed that in Do monad transformers, generally speaking, arise out of adjunctions?, a Stack Overflow Q&A.

1 Like

Cool!

I wish you wrote that in mathematical notation instead of suggestive Haskell notation. I recall Stack Overflow does not support Latex…

Side note: I’ve just noticed the Proxy/Const () counterexample you mention at the opening post demonstrates a violation of law # 2 for the compose-on-the-inside, Traversable-based transformers. I’ve updated the SO answer to mention that.

1 Like

summary of the laws
required for the composition of monads to be a monad

notation

  • Capital letters like F and N stand for functors. In Haskell they are usually represented as type constructors with a Functor instance.
    • Application of a functor F to an object X is denoted as function application FX, or F X in Haskell.
    • Application of a functor F to an arrow f is denoted as function application Ff, but in Haskell we write fmap @F f.
    • The exponential notation YX and the arrow notation X → Y denote the arrow functor of X, applied here to Y. The application of this functor to functions is denoted fX or fmap @((→) X) f. (Sadly, the superscript notation is also used for other things, like say monad operations, so confusion is possible — beware.)
  • Greek letters like τ: F → G stand for natural transformations. They are often equipped with a subscript which denotes specialization to a specific object, like so: τX: FX → GX. In Haskell they are usually represented as polymorphous functions, like τ ∷ ∀ α. F α → G α.
  • The circle denotes composition of all kinds of arrows. We often neglect to write it, as confusing as this may be.
    • g ∘ f is the composition of function g after f. We often simply write gf. In Haskell we write g . f .
    • G ∘ F is the composition of functor G after F. We often simply write GF. In Haskell we write Compose G F after import Data.Functor.Compose.

definitions

  • The function ηM is the unit of a monad M. In Haskell we write pure @M.
  • The function µM is the join of a monad M. In Haskell we write join @M after import Control.Monad.

result

Let l: NM → MN be the distributive law of monads M and N, written in Haskell as swap ∷ ∀ α. N (M α) → M (N α). Then we have MN a monad with ηMN = ηM ∘ ηN and µMN = MμN ∘ µM ∘ Ml: MNMN → MN so long as the following conditions are satisfied:

  1. ∀f. l ∘ NMf = MNf ∘ l — in Haskell this is ensured by parametricity.

  2. l ∘ ηN = MηN: M → MN

  3. l ∘ NηM = ηM: N → MN

  4. There are two options. Either implies the other one (according to me).

    • after Mark Jones and Luc Duponcheel

      p ∘ Nd = d ∘ p: NMNM → MN where d = µM ∘ Ml: MNM → MN and p = MµN ∘ l: NMN → MN

    • after Jon Beck

      1. µM ∘ Ml ∘ l = l ∘ NµM: NMM → MN
      2. N ∘ l ∘ Nl = l ∘ µN: NNM → MN

example

Let X and Y be some types. Then we have M α = αX and N α = αY, written in Haskell as type constructors ((→) X) and ((→) Y) ∷ ★ → ★. We prove that they are functors and monads and that their composition is also a monad.

These monads are instances of the familiar Reader monad, and we should expect their composition to be a Reader monad as well, since we have curry; uncurry⟩: X → Y → α ⇄ X × Y → α. The distributive law in this case is flip.

All proofs are done by translation to λ calculus, βη reduction and α equivalence. Some of this work can be automated with tools like https://lambdacalc.io/, it takes a while if done by hand.

they are functors

Define identity like so: id ≔ λ x. x. Define composition like so: ∘ ≔ λ g f x. g (f x). Define fX ≔ λ g. λ f. g ∘ f. Check that (h ∘ g) ∘ f = h ∘ (g ∘ f). Check that idX = id. Check that (g ∘ f)X = gX ∘ fX. Done!

they are monads

Define η ≔ λ a b. a. Define µ ≔ λ f x. f x x. Check that µ ∘ η = id, µ ∘ ηX = id, µ ∘ µ = µ ∘ µX. Done!

their composition is a monad

Define l ≔ λ f x y. f y x. Check the laws outlined above. Done!