example
Say I have a dynamical process f: α → α. I can iterate it to my heart’s happiness: f ∘ f: α → α, f ∘ f ∘ f: α → α and so on. Soon I should want to add some effects to it, like say nondeterminism with ndf: α → [α] or a progress bar with iof: α → IO α. So far so good, ndf <=< ndf: α → [α] and iof <=< iof: α → IO α. Hey, why not both? Welcome iondf: α → IO [α] and ndiof: α → [IO α]. There is nothing special about [ ] too, I can instead ask for a random outcome with some kind of a random monad, or even say «any monad will do». But now there is a bunch of questions before me.
questions

Are IO [α] and [IO α] in principle the same? We can mediate one way with
sequence
. We cannot mediate the other way, can we? There must be some informationsequence
loses? So what is the difference, what is this lost information? 
Do we have a monad again?
 We have an inlay η = pure @ IO ∘ pure @[ ]: α → IO [α].
 We have a map map = fmap @ IO ∘ fmap @[ ]: (α → β) → IO [α] → IO [β] .
 But is there a function μ = join: IO [IO [α]] → IO [α] such that μ ∘ η = id, μ ∘ map η = id and μ ∘ μ = μ ∘ map μ?
 The obvious candidate is fmap μ ∘ μ ∘ fmap sequence. We only have to check the equations, right? So how do we check the equations?
 For the other composition, [IO α], this definition is not going to work, because
IO
is not traversable. So, if there is a natural function that works for any monads, ours is not it. (It turns out there is no natural function because some monads do not compose.)

What is
IO
, anyway? How do we know there is a monad IO to begin with? There are some pure specifications of something that looks like IO (say this one by Wouter Swierstra and Thorsten Altenkirch), but how can we know they share properties with the realIO
thing?
transformers
The standard solution is not to do this at all. A programmer is expected to wield something called monad transformers (offered by Sheng Liang, Paul Hudak and Mark Jones) instead. It seems to be a safety feature. For example, monad transformers allow you to build a monad of type input → Either left right
, but not Either left (input → right)
— it so happens the latter is not a monad (as offered by Petr Pudlák). We are expected to trust that whoever wrote the monad transformers has divine insight into the laws of the Haskell universe. It is not ever justified anywhere by anyone why we should have monad transformers when we could just compose (many, though not all) monads, is it? (Not that the justification is ever asked for either.) Why EitherT either (Reader reader)
and not simply Compose (Either either) (Reader reader)
? Who is this EitherT
guy anyway?
composition
Curiously, Mark P. Jones and Luc Duponcheel also outlined the conditions needed for two monads to compose right away, without any transformer fuss. You only need a polymorphic function ∀α. t (m α) → m (t α), such as sequence
, that also obeys the laws S₂–S₄ on page 9 of the article; the law S₁ is fulfilled by parametricity. This looks like something that can be encoded as a class of kind (★ → ★) → (★ → ★) → Constraint.
I reckon Mark and Luc were not aware of earlier work by Jon Beck, Distributive Laws, in Lecture Notes in Mathematics, No. 80, SpringerVerlag, 1969. There he offers that a «distributive law» should be a natural transformation tm → mt fulfilling 4 equations. Two of these equations are S₂ and S₃; S₄ can be shown from the other two. Jon goes on to show that his distributive law is enough to prove monad laws for the composition mt. So, Mark and Luc independently arrive to almost the same result as Jon.
transformers versus composition
Monad composition turns out to be a well known thing. Kitteh can haz! But it only works on a case by case basis: for every two monads, we need a special sequence
that fits the definition of distributive law. I have found examples of sequence
in Haskell that break S₂. It also seems that laws of traversable and monad together are not enough to prove S₂–S₄. Long tale short, there is no ground to think that traversable affords us a distributive law.
At the same time, monad transformers are seemingly parametric. A given monad transformer can transform any monad. So, if, say, the monad transformer ReaderT
is the same as composition with the monad Reader
(on whichever side), it must imply a distributive law between Reader
and any monad (on the fitting side). They know something we do not!
so
So I made the case that there is ground for questions. I have no answers. I have not found any literature that connects IO to monads, monads to monad transformers, monad transformers to distributive laws.
All the counterexamples to composition of a monad with a traversable monad that I have found so far are with the trivial Const ( )
monad and its sequence
as the (broken) distributive law. Then again, I only checked the simplest monads: Reader
, Writer
, Maybe
. I am also suspicious about Compose [ ] [ ]
because it hangs. Also, it is known thanks to Bartek Klin and Julian Salamanca that the twin covariant power set functor (one of a few such functors known to mankind) is not a monad, and [ ]
is kind of alike to a covariant power set functor. Then again, could be a fault in my code. Anyway, where are other, more interesting counterexamples to composition of a monad and a traversable monad?
Lastly: if many monads compose just fine, why not have a class for distributive laws?