How RealWorld passes to the main?

Hi all, I’m studying how I/O works and wondering how RealWorld passes to the program. I think it is somewhere in RTS but I’m not sure and I can’t find it

From what I know, we can rewrite the following code

main :: IO ()
main = do
  c <- getChar
  print c

as

main :: IO ()
main = IO $ \s ->        -- how RealWorld appears here?
  case unIO getChar s of
    (# s', c #) ->
      case unIO (print c) s' of
        x@(# s'', _ #) -> x

Could someone help me with that, please?

In the end, State# RealWorld is nothing. It is just a token used during optimizations to make sure everything is sequenced correctly. When GHC produces the final executable, all the State# RealWorld tokens are removed from the program.

2 Likes

ah, I thought about it

do you have some links about that behaviour? if so, could you provide the link where I can read about? (I would like to know more information, because I didn’t find that in papers such as “Tackling the Awkward Squad”, “Imperative functional programming”)

I don’t have any links. I think to know more you’ll have to look at GHC’s source code. What questions do you still have?

big thanks for the answer, will try to read GHC source code in next 4 months, cause I have many unfinished topics

um, actually I’ve forgotten to ask one more question and If you could answer on it, I would appreciate it

if we write that example

main :: IO ()
main = IO $ \realWorld ->
	case putStrLn "hello" of
    	IO f1 -> case putStrLn "goodbye" of
        	IO f2 -> case f2 realWorld of
            	(# realWorld', _ #) -> f1 realWorld' -- use realWorld of goodbye

and evaluate it, we will have this

goodbye
hello

in that case we chained the world, right?

however, if we write like this

main :: IO ()
main = IO $ \realWorld ->
	case putStrLn "hello" of
    	IO f1 -> case putStrLn "goodbye" of
        	IO f2 -> case f2 realWorld of
            	(# _realWorld, _ #) -> f1 realWorld -- use main's realworld

we will have the same thing, even though I’ve not threaded it

goodbye
hello

I really don’t understand that behaviour.
According IO Inside, IO monad was born because of chaining “realWorld”, there were examples where realWorld was another datatype such as Time or Int, and then it became a RealWorld

sorry for English, if so

Apologies for being late - today was a busy one!

So before I (attempt to) answer your questions, you need to be aware of the status of IO a in Haskell since it was introduced in 1996:

This is because there is more than one way to define IO a - Philip Wadler provides some examples in How to Declare an Imperative (1997), of which the closest to the approach used in GHC would be the linear logic style:

So unless you’re e.g. attempting to add some new low-level feature to GHC, using the definition of IO a that GHC relies on to try to understand how I/O can work in Haskell can lead to difficulties later when attempting to explain how more advanced features (such as those discussed in Tackling the Awkward Squad (2001).


I’m […] wondering how RealWorld passes to the program.

There are two answers here:

  • In terms of semantics: an initial RealWorld is provided to unIO main by the Haskell implementation.

  • In GHC’s sources: just as @jaror described; the process of doing this is usually referred to as type erasure.

main :: IO ()
main = IO $ \s ->        -- how RealWorld appears here?
  case unIO getChar s of
    (# s', c #) ->
      case unIO (print c) s' of
        x@(# s'', _ #) -> x
2 Likes

Do you have sources for this? AFAIK GHC really does not even try to apply main to an argument and just drops the argument at some point. And I believe the Haskell 2010 standard does not specify that main will be run using an unIO function, instead it just says:

When the program is executed, the computation main is performed, and its result (of type τ) is discarded.

There is no unIO that I’m aware of.

If you don’t properly chain the State# RealWorld token through your program then your program may be evaluated in any order, but in case of very simple programs it is still likely that the order will stay the same. GHC just happens to have no reason to change the order in your example. Also note that optimizations levels might change the order.

1 Like

got it, thank you, I’ve compiled that example with O2 but I’ve not seen any order changes. Maybe in more sophisticated programs there will be another order

According IO Inside, IO monad was born because of chaining “realWorld”, there were examples where RealWorld was another datatype such as Time or Int, and then it became a RealWorld

As the most recent contributor to that page…I’m open to suggestions as to how it can be explained better from the perspective of new Haskellers (because clearly my most recent attempt has failed).


Note: IO Inside was originally written by someone else, so I’m reluctant to make “grand sweeping changes” because that risks replacing one set of “mannerisms” (from the original author) by another set (mine), which would make such large changes useless.

1 Like

See section 2.2 of Tackling the Awkward Squad (remembering that IO a in GHC is defined as a newtype).

1 Like

That section does not mention unIO. It just says (this is the most relevant bit I could find in that section):

A complete Haskell program defines a single big I/O action, called main, of type IO (). The program is executed by performing the action.

1 Like

It shouldn’t have to: the presence of unIO is merely an artifact of the decision to define IO a as a newtype in GHC.

This is now probably getting too abstract for the OP - I suggest starting a new thread for this (sub)topic, because I suspect it will be a very long thread…

1 Like

One consequence of re-using State# RealWorld is that the compile may optimize away the IO actions.

If you change the argument to putStrLn like this:

main :: IO ()
main = IO $ \realWorld ->
  case putStrLn "hello" of
      IO f1 -> case putStrLn "hello" of
          IO f2 -> case f2 realWorld of
              (# _realWorld, _ #) -> f1 realWorld

Then the compiler may optimize the program to:

main :: IO ()
main = IO $ \realWorld ->
  case putStrLn "hello" of
      IO f -> case f realWorld of
          result@(# _realWorld, _ #) -> result -- putStrLn is called only once

Interestingly, in linear-base (an alternative to base which makes use of LinearTypes), the constructor of the IO type is exposed:

IO (State# RealWorld %1 -> (# State# RealWorld, a #))	 

Note that this is the same definition as the standard IO monad, but with a linear arrow enforcing the implicit invariant that IO actions linearly thread the state of the real world. Hence, we can safely release the constructor to this newtype.

2 Likes

But that’s no panacea either: you can still define functions that reuse State# (...) values:

spare :: State# RealWorld -> (# State# RealWorld, State# RealWorld #)
spare s = (# s, s #)

So you could write something resembling e.g. what’s described here, which could privately duplicate the state:

https://mail.haskell.org/pipermail/haskell-cafe/2008-May/043694.html

…you just can’t use it with the linear IO data constructor to define new I/O actions, just like:

qux = IO spare {- linearity error; state duplicated -}

But this is also probably getting too abstract for the OP…

Your program has a Main.main :: IO (). The compiler would wrap it with a “top handler” that handles exceptions uncaught in user code to form :Main.main :: IO (), after Z-encoding it’s a static heap object ZCMain_main_closure.

Now, at run-time, ZCMain_main_closure is evaluated at here. rts_evalLazyIO is one of the RTS API functions that takes a closure (heap object) with type IO r and executes the side effects, returning r without forcing it. If you visit its definition, it calls createIOThread, which creates the main Haskell thread that’s used for executing this IO r thing.

All Haskell threads are first created with createThread that pushes a stop frame on the bottom of the stack. Now, createIOThread will first push a stg_ap_v stack frame, then on top of that, a stg_enter frame with the ZCMain_main_closure as its payload. The stg_enter frame “enters” a closure, which means it will evaluate its payload to WHNF. Given ZCMain_main_closure is a thunk and needs to be evaluated first, this makes sense.

After it’s forced to a function closure, stg_ap_v will actually invoke the function by passing nothing to it. So this is where OP’s question really gets answered. The ap stands for “apply” and v stands for void, which is State# RealWorld in the Haskell land, not backed by anything at runtime. IO r in Haskell is State# RealWorld -> (# State# RealWorld, r #), and at Cmm level it’s a function closure that doesn’t really take any argument and only returns the r closure. It’s a zero-argument function, and the crucial difference between IO r and r is you can invoke the IO r many times and the side effects will be executed each time, but with r it will be executed only once and then the result is memoized.

8 Likes