Confusion about the operational semantics of the let binding (LET rule) in the STG language

So I was reading the push/enter vs eval/apply paper. In the paper they described the operational reading of the let binding (in section 4.1) as allocating space for the rhs of the = sign on the heap and then substitute the returned pointer for the lhs in the continuation of the let expression.


I’m a bit confused about how this is actually implemented in practice cause the continuation should be compiled code and you can’t just modify compiled code. Also, the heap objects needs to live long enough before it can be GC’d so it must somehow be added to the GC root, which I assume is the stack but paradoxically the rule doesn’t touch the stack at all.

1 Like

Since support for let(rec) STG-expressions is shared by both versions of the STGM in that paper, the earlier paper:

Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine (1992)

…is relevant here - page 49 of 76 provides an overview of how let(rec) STG-expressions are compiled.

It doesn’t have to live. If there’s no reference to x then it is immediately garbage. In other words the code is the thing that is keeping the memory alive. In practice that may be the registers or the “stack”. But note that they use a very particular stack in the paper:

The stack s, is a stack of continuations that says what to do when the current expression is evaluated. We use the notation “:” to means cons in the context of a stack.

Right, I should have added the condition “if lhs is used in the continuation”.

The original paper just mentions they are stored in a “global environment” which I think is very vague and isn’t whats actually being implemented.

I’ve often been confused by this stuff in the past too. It feels like the translation is given at too high an abstraction level. I’d rather see it treat closures more explicitly (or maybe I’m just missing the interpretation by which it does treat closures explicitly). In this case I think e[x'/x] means form the closure of the code of e with the value x', i.e. make some sort of data structure containing both of them. I think you have to go to the Cmm level before you can actually see what this looks like at a concrete level though.

I think the abstraction level was chosen on purpose to avoid cluttering the specification with details about the machine running the program.

If you want to see an abstract machine for a lazy functional language, then I’d recommend this paper:

The mark 2 (Fig. 5) and mark 3 (Fig. 6) machines don’t use any substitution.

mark 3 (Fig. 6) machine doesn’t use any substitution.

…but it does use de Bruijn indices instead of named binders, hence the difference in input language syntax. So I recommend just looking at that mark 2 machine.

Here’s a simple Haskell implementation of the mark 2 machine:

import qualified Data.Map.Strict as Map

data Exp = Var String | Lam String Exp | App Exp String | Let String Exp Exp deriving Show

data State = State Heap Control Environment Stack deriving Show

data Heap = Heap !Int !(Map.Map String (Exp, Environment)) deriving Show
type Control = Exp
type Environment = Map.Map String String
type Stack = [(Bool, String)]

alloc :: Heap -> (String, Heap)
alloc (Heap i m) = (show i, Heap (i + 1) m)

set :: Heap -> String -> Exp -> Environment -> Heap
set (Heap i m) p e ev = Heap i (Map.insert p (e, ev) m)

get :: Heap -> String -> (Exp, Environment)
get (Heap _ m) p = m Map.! p

machine :: State -> Maybe State
-- app 1
machine (State g (App e x) ev s) = let p = ev Map.! x in
  Just $ State g e         ev ((False, p) : s)
-- app 2
machine (State g (Lam y e) ev                  ((False, p) : s)) = 
  Just $ State g e         (Map.insert y p ev) s 
-- var 1
machine (State g (Var x) ev  s) = let p = ev Map.! x; (e', ev') = get g p in
  Just $ State g e'     ev' ((True, p) : s) 
-- var 2
machine (State g              e@Lam{} ev ((True, p) : s)) =
  Just $ State (set g p e ev) e       ev s 
-- let
machine (State g                (Let x e e') ev  s) = let (p, g') = alloc g; ev' = Map.insert x p ev in 
  Just $ State (set g' p e ev') e'           ev' s
-- halt
machine _ = Nothing

loop :: State -> State
loop x = maybe x loop $ machine x

-- >>> loop (State (Heap 0 mempty) (Let "y" (Lam "z" (Var "z")) (App (Lam "x" (Var "x")) "y")) mempty [])
-- State (Heap 1 (fromList [("0",(Lam "z" (Var "z"),fromList [("y","0")]))])) (Lam "z" (Var "z")) (fromList [("y","0")]) []
1 Like

At compile time, register allocator can already decide, statically, that the compiled code for e should expect the address of obj to be in, say, register r14 (true story). IOW, morally, the whole thing goes like:

p := allocate()
put obj at p
r12 := p
code for e, it uses r14

At GC time, clearly, all registers and the stack become roots.

You can play with these examples and see what GHC compiles them to:

mk :: Int -> Maybe Int
mk x = Just (x + 1)

rev :: [a] -> [a] -> [a]
rev acc (x:xs) = rev (x:acc) xs
rev acc [] = acc

You can use -O -ddump-stg-final -ddump-cmm -ddump-asm -ddump-to-file -dsuppress-all to see all the intermediate code, and use a higher-level one to help guess a lower-level one.

You may also find this useful: https://github.com/takenobu-hs/haskell-ghc-illustrated

2 Likes

I see. I assume if there are too many let bindings to be able to fit into the registers then those that cannot fit will be placed on top of the stack instead?

So in this interpretation

let x = obj in e; s; H => e[x'/x]; s; H[x' -> obj]

tells you how to compile obj? That is, you allocate a register x' pointing to obj (or a space on the stack) and then you compile e taking into account that statically known information to refer to obj at that register?

As others have pointed out, the abstract machine described is still only a transition system, not a low-level implementation of the latter.

In that light, I, too, find the use of textual substitution rather than delayed substitution as in Sestoft’s Mark II machine (see link by @jaror above) regrettable. No translation to machine code can handle textual substitution! It is important that there is only a finite set of expressions representing the program, each of which need to be compiled to machine code. And that is what you achieve in a model where you delay the substitution via explicit environments.

So it would be far more instructive to see machine configurations with 4 components:

  1. Control, or focus, expression e
  2. Environment ρ (or perhaps E), mapping free vars in e to pointers/addresses
  3. Heap μ (or H as in the paper), mapping addresses to closures (ρ,e)
  4. Continuation κ, or Stack S.

And this is exactly Sestoft’s Mark II machine. (The similarity to Felleisen’s CESK machine is no coincidence. Same concept for call-by-value. By contrast, the paper gives a CEK machine.)
Since the program has only finitely many expressions e to begin with, we can compile code for each of them, assuming that we have available a (think “this”) pointer to the environment ρ.

This is how you should think about the machine in the paper (which IIRC is rather like Sestoft’s Mark I machine). But the main point of the paper is not the machine, it is how the machine is extended to support multi-arity, under- or oversaturated calls, which is entirely orthogonal to whether the transition system uses textual or delayed substitution.

How to follow GC’able pointers is again an orthogonal issue, I suppose. But GHC associates meta data with the code generated for every expression, as well as stack frames, heap objects, etc, called the object’s info table (think C++ RTTI). IIRC, inside that metadata of a chunk of code, there is a bitmask outlining which stack slots are GC’d pointers. When returning from a call, the continuation code even has subsequent bitmasks to refine this chunk in order to collect heap objects that were pointed to by stack slots that will never be accessed again.

8 Likes

An earlier description of an eval-apply STG machine can be found here:

Compiling Lazy Functional Programs to Java on the basis of Spineless Tagless G-Machine with Eval-Apply Model (2001)

While the text is mostly Korean, the figures and examples should be reasonably self-explanatory to those with prior exposure to abstract-machine definitions e.g. the paper by Sestoft.

Thank you all, I’m definitely going to read the Sestoft paper. My initial motivation for reading the push/enter vs eval/apply paper is to understand the operational details of Haskell programs compiled by GHC. I wanted to understand the semantics of STG first because it’s the lowest level IR which I can still recognize the program structures (to me Cmm is totally unrecognizable, same of course goes for assembly). I chose that paper cause it contains the most up to date description of the operational semantics of STG that I can find.

But it seems that the semantics described by eval/apply paper is still too high-level to my liking. In particular I still do not understand how the objects (heap/stack objects) in the paper are mapped to the GHC implementation (which I assume is declared in rts/include/rts/storage/Closures.h but I’m not sure).

1 Like

I think understanding how STG is implemented is very closely related to understanding Cmm. Perhaps this talk is what you’re looking for:

2 Likes

Yes, definitely go for the Sestoft paper first. It made a lot of things “click” for me.

Perhaps also have a look at Felleisens work, whose main motivation with CESK etc. IIRC was to find a simple call-by-value machine that can cope with mutable state and control operators. After he found out about CESK, he tried to “make it smaller” again by using syntactical substitution. The point is, I suppose, that there are many different flavours of operational semantics and it’s good to know how they are all interchangeable in some way, because for a given problem you will want to pick the simplest representation of the machine possible.
For example, syntactical substitution makes some reasoning simpler, but it won’t help you if you want to compile code. Knowing that you have chosen delayed substitution as your reference model because of that ensures you couldn’t have made your job simpler.

You might also be confused by the relation of abstract machines to contextual semantics (both small-step semantics), in which the configurations of the semantics are simply closed expressions. Every step in the transition system will “search” for the next redex by splitting e into an evaluation context E and the regex e', such that e = E[e']. Such a decomposition is always unique (needs proof). But if you spend some time looking at these, you will see that “abstract machines” are simply an efficient way to do the “search”; E is really (ρ,μ,κ) (env, heap, stack) from above. Yet it is often simpler to have a statement about a closed expressione E[e'] than about a machine configuration (e',ρ,μ,κ).

The pointer to Ben’s RTS talk is great because it discusses things like closure layout, but I wouldn’t say that Cmm is that important to understand the implementation of the STG machine as a concept; if it weren’t for unrestricted tail calls, one could simply generate C (the Lean compiler does this).
Also as the similarity to CESK suggests, by-value machines are not so different. Lean uses quite the same compilation scheme as GHC, except they do lambda lifting instead of closure conversion, so they have to pass all free variables at call sites and allocate more partial applications to do so.

4 Likes

I guess you mean “redex” (reducible expression)?

1 Like

Yes, thank you (20 chars)

1 Like

I find it a bit weird that the natural semantics cannot handle non-terminations (they produce infinite derivation trees) :confused: