As others have pointed out, the abstract machine described is still only a transition system, not a low-level implementation of the latter.
In that light, I, too, find the use of textual substitution rather than delayed substitution as in Sestoft’s Mark II machine (see link by @jaror above) regrettable. No translation to machine code can handle textual substitution! It is important that there is only a finite set of expressions representing the program, each of which need to be compiled to machine code. And that is what you achieve in a model where you delay the substitution via explicit environments.
So it would be far more instructive to see machine configurations with 4 components:
- Control, or focus, expression
e
- Environment
ρ
(or perhaps E
), mapping free vars in e
to pointers/addresses
- Heap
μ
(or H
as in the paper), mapping addresses to closures (ρ,e)
- Continuation
κ
, or Stack S
.
And this is exactly Sestoft’s Mark II machine. (The similarity to Felleisen’s CESK machine is no coincidence. Same concept for call-by-value. By contrast, the paper gives a CEK machine.)
Since the program has only finitely many expressions e
to begin with, we can compile code for each of them, assuming that we have available a (think “this”) pointer to the environment ρ
.
This is how you should think about the machine in the paper (which IIRC is rather like Sestoft’s Mark I machine). But the main point of the paper is not the machine, it is how the machine is extended to support multi-arity, under- or oversaturated calls, which is entirely orthogonal to whether the transition system uses textual or delayed substitution.
How to follow GC’able pointers is again an orthogonal issue, I suppose. But GHC associates meta data with the code generated for every expression, as well as stack frames, heap objects, etc, called the object’s info table (think C++ RTTI). IIRC, inside that metadata of a chunk of code, there is a bitmask outlining which stack slots are GC’d pointers. When returning from a call, the continuation code even has subsequent bitmasks to refine this chunk in order to collect heap objects that were pointed to by stack slots that will never be accessed again.