Trick to lift IO into any monad

facundominguez · May 18, 2024, 6:01pm

Dear Haskellers,

I recently concocted a definition of a lifting function for IO with a type signature like

liftIOtoM :: Monad m => IO a -> m a

This is very similar to liftIO, but it depends on a Monad m constraint rather than MonadIO m. I share the implementation below, of which I wish to ask the community for assessment. But first let me go into a bit of motivation.

Depending on Monad m makes it possible to implement a version of traceM that doesn’t have the caveats about messages being printed less often than desired.

traceM = liftIOtoM . hPutStrLn stderr

Similarly, it also allows me to define measureM in timestats for any monad instead of only for MonadIO m.

Here’s the implementation.

liftIOtoM :: Monad m => IO a -> m a
liftIOtoM m = do
    -- The fictitious state is only used to force @unsafePerformIO@
    -- to run @m@ every time @liftIOtoM m@ is evaluated.
    s <- getStateM
    let p = unsafePerformIO $ do
          r <- m
          pure (s, r)
    case p of
      (_, r) -> pure r
  where
    -- We mark this function as NOINLINE to ensure the compiler cannot reason
    -- by unfolding that two calls of @getStateM@ yield the same value.
    {-# NOINLINE getStateM #-}
    getStateM = pure True

As you can see, it depends on unsafePerformIO which requires special attention to not introduce unwanted behavior. I have spotted a couple of mistakes in my initial attempts, which I have fixed for this presentation. So here is the question: do you think this code could do ok for the above use cases?

Thanks!
Facundo

ocramz · May 18, 2024, 6:36pm

I mean, good trick, but this is dangerous stuff that should be kept tucked away

Bodigrim · May 18, 2024, 6:44pm

This would be a better implementation of traceM in base indeed.

michaelpj · May 18, 2024, 7:04pm

I think if you called this unsafePerformIOM it could be a useful addition to base to sit next to unsafePerformIO.

tomjaguarpaw · May 18, 2024, 7:51pm

What’s the intention for measureM for a monad that runs its continuation multiple times? For example, consider the list monad program below. The output is

("slow first",TimeStats {timeStat = 1526291194, countStat = 2})
("slow second",TimeStats {timeStat = 1988, countStat = 2})

so it seems like only the first branch of the branching monadic computation is timed.

import Control.Exception (evaluate)
import Debug.TimeStats (collect, measureM)
import System.Environment (setEnv)

slow n = if n >= 1000 * 1000 then [()] else slow (n + 1)

fast = [()]

example = last (measureM "slow first" (slow 0 ++ fast) *> [1.. 10])

example2 = last (measureM "slow second" (fast ++ slow 0) *> [1.. 10])

main = do
  setEnv "DEBUG_TIMESTATS_ENABLE" "Any value"
  evaluate example
  evaluate example2
  mapM_ print =<< collect

facundominguez · May 18, 2024, 9:21pm

This one took me a bit to unravel. I would expect to measure the time of all the executions. This is not happening in your example though.

On the one hand, since evaluate forces evaluation of the first element of the resulting list, I wouldn’t expect all branches to be evaluated. But even if this were not a problem, I think the example points at a fundamental limitation of liftIOtoM. Consider the following example:

print $ do
      liftIOtoM $ putStrLn "a"
      [True, False]
      liftIOtoM $ putStrLn "b"

The above code only prints

a
b
[(), ()]

I think this is because a monadic action is a list, so liftIOtoM (putStrLn "b") == [()], and nothing causes [()] to be recomputed when it is used a second time.

putStrLn "b" has a better chance to run a second time if there is some value fed into the continuation, like with a Reader env monad. But I could imagine a compiler optimizing things, discovering that liftIOToM is always called with the same env value, and deduplicating the effects with CSE.

darkxero · May 18, 2024, 9:57pm

Would this implementation work?

liftIOtoM :: Monad m => IO a -> m a
liftIOtoM io = return $! unsafePerformIO io

If you change it to:

print $ do
      liftIOtoM $ putStrLn "a"
      _ <- [True, False]
      liftIOtoM $ putStrLn "b"

Then you get:

a
b
b
[(), ()]

Is this what you expect from [True, False]?

I wanted to ask: what’s the use case?
You want to print something and return a previous value?

tomjaguarpaw · May 19, 2024, 6:54am

N.B. I applied last to the lists, which is why evaluate evaluates all branches.

facundominguez · May 19, 2024, 9:33pm

There are a few factors that cause timestats to not account correctly for the time spent in all branches. Thanks for sharing this example, it has been rather enlightening.

At the moment, either following the execution with traceM statements or measuring wall clock time in monadic contexts.

I think in the case of the list monad, your implementation is simpler and has the same behavior as mine. With a monad like IO, I would expect it to be subject to the same caveats as in the documentation of Debug.Trace.taceM while mine would not.

However, here’s another example where this trick doesn’t do better than Debug.Trace.traceM either.

import qualified Control.Monad.Free as Free

main :: IO ()
main = Free.retract $ do
  let m = liftIOtoM (putStrLn "a")
  m
  m

The output is

$ ghc -package base -O Test.hs && ./Test
a

where I wanted

$ ghc -package base -O Test.hs && ./Test
a
a

Probably a is printed twice only if the bind operation of the monad is not strict on the result of the second argument. In the following program, the printing is executed when forcing the monadic computation to WHNF.

import qualified Control.Monad.Free as Free

main :: IO ()
main = do
  let p :: Free.Free IO ()
      p = do let m = liftIOtoM (putStrLn "a")
             m
             m
  seq p (return ())

Thanks all. I’m a tad less excited about this trick now

darkxero · May 20, 2024, 1:47am

I don’t think it is possible to change liftIOtoM to make m run twice if you bind it with a let like that.
However, I ran:

module Main (main) where

import qualified Control.Monad.Free as Free
import System.IO.Unsafe (unsafePerformIO)

liftIOtoM :: Monad m => IO a -> m a
liftIOtoM m = do
    -- The fictitious state is only used to force @unsafePerformIO@
    -- to run @m@ every time @liftIOtoM m@ is evaluated.
    s <- getStateM
    let p = unsafePerformIO $ do
          r <- m
          pure (s, r)
    case p of
      (_, r) -> pure r
  where
    -- We mark this function as NOINLINE to ensure the compiler cannot reason
    -- by unfolding that two calls of @getStateM@ yield the same value.
    {-# NOINLINE getStateM #-}
    getStateM = pure True

liftIOtoM2 :: Monad m => IO a -> m a
liftIOtoM2 io = return $! unsafePerformIO io

main :: IO ()
main = Free.retract $ do
    let m = liftIOtoM (putStrLn "a")
    liftIOtoM (putStrLn "a")
    liftIOtoM (putStrLn "a")
    let n = liftIOtoM2 (putStrLn "b")
    liftIOtoM2 (putStrLn "b")
    liftIOtoM2 (putStrLn "b")
    m
    m
    n
    n

to compare both implementations and I got

a
b
b
b

So you might find my lift useful if you don’t put it in a let.