Monad of No Return: Issues with `(>>) = (*>)`

A handful of months ago, I brought the monad of no return proposal to discourse, and now many months later I bring what I hope will be my last question regarding one issue with one part of the proposal.

TL;DR: I’m looking for examples of fixes of memory leaks or issues about memory leaks due to the default definition of (*>).

To summarise the intent of the proposal, there are methods between the Applicative and Monad type classes that should always behave identically, if the type they are defined on is lawful. These methods can therefore be “canonically” defined: return = pure, and (>>) = (*>). We wish to get rid of the redundancy here, and make these the top level definitions.

There is an issue with (>>) = (*>) specifically, however. Currently, (>>) is defined as (>>) a b = a >>= \_ -> b (which I’ll alternatively call thenM), while (*>) a b = (id <$ a) <*> b (which I’ll call thenA). In making (>>) be equivalent to its canonical definition (so, (>>) = (*>)), we’d be changing its default definition for virtually every Monad in existence, and there have been issues with the definition of (*>) in the past. One example of that can be found in issue 33 of the transformers package. The common problem seems to be that the definition thenA leaks some amount of memory, but we’re unclear as to why.

My personal theory is this: only in the cases of monad transformers (or equivalents) will we see these performance regressions. If this is the case, then potential regressions industry wide are very low compared to the alternative worst case.

I’m therefore looking for more examples of badly behaved default (*>) definitions. If they’re transformer-like, that’s good to have more examples, but if they’re not transformer-like, I’m especially interested.

If you want to ask more about the proposal(s), I’m also happy to answer those questions here.

15 Likes

I don’t think there’s any lack of clarity why: x *> y and x >> y are for “tail calls”, i.e. after x, execution can be handed over entirely to y, without keeping anything around. The default definition of *> (thenA) is troublesome for this: (id <$ x) <*> y will, by default, needlessly keep the result of id <$ x (i.e. just id) around whilst it executes y. The default definition of >> (thenM) won’t.

What is unclear is the extent to which optimizations mitigate or resolve this issue, and how trouble the issue would actually cause in the wild.

5 Likes

To me it’s unclear why the optimisations are unable to eliminate keeping the result around in most of the cases seen, hence why I want more examples. I have been doing some tinkering with transformers as I mentioned in the proposal, and am still trying to understand what I’ve gotten out of it. When I can access my computer again, I’ll bring what I’ve got here for discussion before going back to the proposal.

1 Like

Maybe your ‘by default’ is hedging against this, but it’s worth noting that for some Applicatives, there’s nothing to keep around even ignoring optimizations. Consider Applicative ((->) a):

instance Applicative ((->) r) where
    pure = const
    (<*>) f g x = f x (g x)

With zero optimizations, here’s how thenA reduces:

thenA x y z
((id <$ x) <*> y) z
(id <$ x) z (y z)
(fmap (const id) x) z (y z)
(const id . x) z (y z)
const id (x z) (y z)
id (y z)
y z

So y ends up in tail position after all!

Re examples, I’ve just noticed that the RWS.CPS and Writer.CPS transformers, which were added after #33 was merged, need their (*>)s patched. If someone who already is handy with hub.darcs.net could submit that patch, I’d appreciate it; otherwise I’ll probably get around to setting up an account and installing darcs and such at… some point.

2 Likes

Yes, and I probably should have said “can, in general” rather than “will, by default”.

Indeed, and Identity is an even more stark example. The Applicatives that experience the issue are those that “hold a resource” “during evaluation”. (Strict) StateT and IO are examples of those. They “hold stack” during the evaluation of the case scrutinee ((a, s) in the former case, (# State# RealWorld, a #) in the latter).

2 Likes

As promised, here’s some of the stuff I was looking at.

I set up some instances of transformers, and have been looking at ExceptT’s Core specifically. I’ve cleaned up a touch for readability.

-- default impl -O0
$fApplicativeExceptT_$c*>
  = \ @f_a5U3 @e_a5U4 $dMonad_a5U5 @a_a5Vz @b_a5VA before after ->
      let { applicativeDict = $p1Monad $dMonad_a5U5 } in
      let { functorDict = $p1Applicative applicativeDict } in
      (>>=
         $dMonad_a5U5
         (fmap functorDict (const id <$>) (before `cast` <Co:4> :: ...))
         (\ a ->
            case a of {
              Left l -> pure applicativeDict (Left l);
              Right rf ->
                fmap
                  functorDict
                  (\ b ->
                     case b of _ {
                       Left _ -> b;
                       Right r -> Right (rf r)
                     })
                  (after `cast` <Co:4> :: ...)
            }))
      `cast` <Co:5> :: ...

in the above we use fmap twice, and have to pass around id within an either.

-- default impl -O2
$fApplicativeExceptT_$c*>
  = \ @f_a5U3 @e_a5U4 $dMonad_a5U5 @a_a5Vz @b_a5VA before after ->
      case $p1Monad $dMonad_a5U5 of
      { C:Applicative functorDict pure ww2_s87U ww3_s87V ww4_s87W
                      ww5_s87X ->
      case functorDict of { C:Functor fmap ww7_s87R ->
      (>>=
         $dMonad_a5U5
         (fmap (const id <$>) (before `cast` <Co:4> :: ...))
         (\ a ->
            case a of {
              Left l -> pure (Left l);
              Right rf ->
                fmap
                  (\ b ->
                     case b of _ {
                       Left _ -> b;
                       Right r -> Right (rf r)
                     })
                  (after `cast` <Co:4> :: ...)
            }))
      `cast` <Co:5> :: ...
      }
      }

virtually identical, just allowing the dictionaries to be unpacked

-- explicit O0
-- just uses the monad implementation
$c*>_r68H
  = \ @f_a5Sr @e_a5Ss $dMonad_a5St @a_a5TX @b_a5TY m_a52L k_a52M ->
      $c>>=_r68E $dMonad_a5St m_a52L (\ _ -> k_a52M)

$c>>=_r68E
  = \ @f_a5Rt @e_a5Ru $dMonad_a5Rv @a_a5RE @b_a5RF before afterF ->
      let { $dApplicative_a5S0 = $p1Monad $dMonad_a5Rv } in
      $ ((\ ds_d67c -> ds_d67c) `cast` <Co:11> :: ...)
        (>>=
           $dMonad_a5Rv
           (runExceptT before)
           (\ a ->
              case a of {
                Left l -> return $dApplicative_a5S0 (Left l);
                Right r -> runExceptT (afterF r)
              }))

this simply uses the monad implementation and doesn’t inline anything, which should be expected

-- explicit -O2
$fApplicativeExceptT_$c*>
  = \ @f_a5U5 @e_a5U6 $dMonad_a5U7 @a_a5VB @b_a5VC before after ->
      case $p1Monad $dMonad_a5U7 of
      { C:Applicative ww_s87G pure ww2_s87I ww3_s87J ww4_s87K
                      ww5_s87L ->
      (>>=
         $dMonad_a5U7
         (before `cast` <Co:4> :: ...)
         (\ b ->
            case b of {
              Left l -> pure (Left l);
              Right _ -> after `cast` <Co:4> :: ...
            }))
      `cast` <Co:5> :: ...
      }

finally, this implementation is about as good as we can get, casting in and out of ExceptT and very simply binding into a case statement.

It’s interesting that there are so many unnecessary operations even in the more optimised default implementations. I was hoping to see an obvious reason why the first few would result in space leaks and the latter don’t, but my brain isn’t able to see such a pattern currently.

This is an interesting piece of information. Does this suggest that the benchmarks (which use IO by default) wouldn’t trigger a memory issue if they had Identity as the base Monad?

1 Like

For some, this is the case.

  • ExceptT, MaybeT, CPS.RWST, Strict.RWST, Strict.StateT, CPS.WriterT, and Strict.WriterT have the same memory usage in IO and Identity.
  • IdentityT, ReaderT, SelectT, and Lazy.StateT use bounded memory with thenA in Identity but unbounded in IO. (All of these have had their (*>) patched to use bounded memory.)
  • AccumT, Lazy.RWST, and Lazy.WriterT use unbounded memory with both (*>) and (>>) in IO, but bounded with both in Identity.

Here’s the script I’ve been using to check up on various monads. It covers all of transformers (except for ContT) and a handful of things from base. Add your own monads, add the packages they come from to the stack script block, and run it with stack. If a row turns red, you’ve found a monad that needs its (*>) patched. If the first column (thenA) disagrees with the second column ((*>)), that’s a monad that either needed patching and got it, or got optimized by GHC at the package level (this is what I think is happening with Either).

{- stack script
   --compile
   --ghc-options -O0
   --ghc-options -Wno-x-partial
   --ghc-options -with-rtsopts=-K100k
   --snapshot nightly-2025-06-11
   --package transformers
-}
{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}

import Control.Exception
import Control.Monad
import Data.Either
import Data.Maybe
import Type.Reflection

import Control.Monad.Trans.Accum
import Control.Monad.Trans.Except
import Control.Monad.Trans.Identity
import Control.Monad.Trans.Maybe
import Control.Monad.Trans.RWS.CPS qualified as CPS
import Control.Monad.Trans.RWS.Lazy qualified as Lazy
import Control.Monad.Trans.RWS.Strict qualified as Strict
import Control.Monad.Trans.Reader
import Control.Monad.Trans.Select
import Control.Monad.Trans.State.Lazy qualified as Lazy
import Control.Monad.Trans.State.Strict qualified as Strict
import Control.Monad.Trans.Writer.CPS qualified as CPS
import Control.Monad.Trans.Writer.Lazy qualified as Lazy
import Control.Monad.Trans.Writer.Strict qualified as Strict
import Data.Functor.Identity
import Data.List.NonEmpty (NonEmpty)
import Data.List.NonEmpty qualified as NE

thenA :: Applicative f => f a -> f b -> f b
thenA x y = (id <$ x) <*> y

times :: (f a -> f a -> f a) -> Integer -> f a -> f a
times _ 1 m = m
times op n m = m `op` times op (n - 1) m

doTest :: Applicative f => (f () -> IO ()) -> (f () -> f () -> f ()) -> IO Bool
doTest runMonad op = try (evaluate <=< runMonad $ times op 10000 $ pure ()) >>= \case
  Right ()           -> pure True
  Left StackOverflow -> pure False
  Left e             -> throwIO e

printWithModule :: TypeRep a -> IO ()
printWithModule tr = putStr (tyConModule (typeRepTyCon tr)) *> putStr "." *> print tr

class Testable a where
  finish :: a -> IO ()

instance Testable () where
  finish = pure

instance Testable (IO ()) where
  finish = id

experiment :: forall m a. Typeable m => Monad m => Testable a => (m () -> a) -> IO ()
experiment runMonad = do
  r1 <- doTest (finish . runMonad) thenA
  r2 <- doTest (finish . runMonad) (*>)
  r3 <- doTest (finish . runMonad) (>>)
  let problem = r2 /= r3
  when problem $ putStr "\027[1;31m"
  putBool r1
  putBool r2
  putBool r3
  putStr "-- "
  printWithModule (TypeRep @m)
  when problem $ putStr "\027[0m"
  where
  putBool = \case True -> putStr "True  "; False -> putStr "False "

main :: IO ()
main = do
  putStrLn "thenA (*>)  (>>)  -- True if space use is bounded"
  putStrLn "-------------------------------------------------"

  experiment @Identity runIdentity
  experiment @Maybe fromJust
  experiment @(Either ()) $ fromRight undefined
  experiment @((->) ()) ($ ())
  experiment @((,) ()) snd
  experiment @[] head
  experiment @NonEmpty NE.head

  experiment @(Accum ()) $ \x -> evalAccum x ()
  experiment @(AccumT () IO) $ \x -> evalAccumT x ()
  experiment @(Except ()) $ fromRight undefined . runExcept
  experiment @(ExceptT () IO) $ fmap (fromRight undefined) . runExceptT
  experiment @(IdentityT Identity) $ runIdentity . runIdentityT
  experiment @(IdentityT IO) $ runIdentityT
  experiment @(MaybeT Identity) $ fromJust . runIdentity . runMaybeT
  experiment @(MaybeT IO) $ fmap fromJust . runMaybeT
  experiment @(CPS.RWS () () ()) $ \x -> fst $ CPS.evalRWS x () ()
  experiment @(CPS.RWST () () () IO) $ \x -> fst <$> CPS.evalRWST x () ()
  experiment @(Lazy.RWS () () ()) $ \x -> fst $ Lazy.evalRWS x () ()
  experiment @(Lazy.RWST () () () IO) $ \x -> fst <$> Lazy.evalRWST x () ()
  experiment @(Strict.RWS () () ()) $ \x -> fst $ Strict.evalRWS x () ()
  experiment @(Strict.RWST () () () IO) $ \x -> fst <$> Strict.evalRWST x () ()
  experiment @(Reader ()) $ \x -> runReader x ()
  experiment @(ReaderT () IO) $ \x -> runReaderT x ()
  experiment @(Select ()) $ \x -> runSelect x id
  experiment @(SelectT () IO) $ \x -> runSelectT x pure
  experiment @(Lazy.State ()) $ \x -> Lazy.evalState x ()
  experiment @(Lazy.StateT () IO) $ \x -> Lazy.evalStateT x ()
  experiment @(Strict.State ()) $ \x -> Strict.evalState x ()
  experiment @(Strict.StateT () IO) $ \x -> Strict.evalStateT x ()
  experiment @(CPS.Writer ()) $ fst . CPS.runWriter
  experiment @(CPS.WriterT () IO) $ fmap fst . CPS.runWriterT
  experiment @(Lazy.Writer ()) $ fst . Lazy.runWriter
  experiment @(Lazy.WriterT () IO) $ fmap fst . Lazy.runWriterT
  experiment @(Strict.Writer ()) $ fst . Strict.runWriter
  experiment @(Strict.WriterT () IO) $ fmap fst . Strict.runWriterT
3 Likes

One reason here is that optimizations are not allowed to change a bottom into non-bottom or vice versa. For instance, thenA for (strict) StateT s IO can be simplified like

thenA :: StateT s IO a -> StateT s IO b -> StateT s IO b
-- thenA m1 m2 = (id <$ m1) <*> m2
thenA m1 m2 = StateT $ \s0 ->
  runStateT m1 s0 >>= \(_x1, s1) ->
  runStateT m2 s1 >>= \(x2, s2) -> pure (x2, s2)

But it cannot be simplified further into

thenM m1 m2 = StateT $ \s0 ->
  runStateT m1 s0 >>= \(_x1, s1) ->
  runStateT m2 s1

Because if the tuple (x2, s2)is bottom, it changes the result of thenA from a bottom IO action into a non-bottom IO action (with a bottom inside).

2 Likes
4 Likes

This is great, but why not improve (<.) and (<*) at the same time? Those could benefit from tail recursion too, for instance with Backwards IO.

(<*) isn’t relevant for the MNR proposal. This is already a big enough undertaking, on largely speculative grounds, and I don’t see anyone else here making PRs.

5 Likes

Fair enough :slightly_smiling_face:

1 Like

Ah, so the pattern matches on the Eithers have to stay in place, because if you removed the latter one it might have been bottom. This makes sense in the Applicative case because if either is bottom, it evaluates both immediately to get bottom, but in the Monad case, it just needs to evaluate the first to get the result.