Where are the two missing bits from `Int`?

In the Haskell Report we can find:

The finite-precision integer type Int covers at least
the range [−2^29 , 2^29−1].

Which is very curious, because on a 32-bit platform it is more common to use one bit for the sign and the other 31 bits for the number, resulting in the range [-2^31..2^31-1]. I am not aware of any other language that uses the 30-bit range.

Also, it is quite easy to see using the methods of the Bounded typeclass that in modern-day GHC programs, the actual range of Int is the full 32-bit resp. 64-bit range.

Which begs the question: What historic reason is there for this peculiar range? Why are there two bits ‘missing’?


Now I think I have an answer. However, it is very difficult to find resources about early Haskell development or about the internals of early GHC and other (pre-GHC) Haskell compilers. If someone who has been around in the Haskell community for longer can corroborate this I would be very happy :blush: !

1) Laziness tagging

To work with lazy values, you need some way to disambiguate not-yet-evaluated thunks from already-evaluated results.
A very simple way to do this, is to use one bit in each pointer as a boolean ‘is this evaluated already?’ tag. (You can do this because on essentially all hardware a pointer always has to be word-aligned which means that the last few bits end up being unused.)
The big drawback of this approach however, is that everywhere in your code you will now need to check on this tag.

This approach was used by the ‘Lazy Abstract Machine’ (LAM), which was used in GHC before the invention of the Spineless Tagless G-machine (STG).

The STG improved on this, by instead of needing a bit in the pointer, always store a callable closure. It’s just that after evaluating a thunk for the first time, its closure will be replaced with a closure which just immediately returns the value. (In very broad strokes)
This is why the STG is ‘Tagless’.

2) Garbage collection

The other bit might have been used by the garbage collector. In the past (again, before the STG), there was no special support for ‘fully unboxed’ values. But obviously Haskell did still have its primitive types which needed to be stored and manipulated in some way. So an Int would just be a value which was stored on the stack. As mentioned above, one bit might have been set if it was still a pointer to a thunk, but if it was already evaluated, it would result in a plain value being present on the stack.

This poses a problem for the garbage collector: All pointer-values on the stack need to be considered as ‘live roots’ for the starting point of the collection process. But if we are mixing pointers and plain values on the stack, we need some way for the GC to know which is which!

I expect that the other bit of Int was used in the past for this reason. ‘Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine’ mentions this GC technique and that the STG wants to avoid it (Section 8.1).
(NOTE: I have not found concrete proof so far however that in the first versions of GHC before the STG or possibly in other Haskell Compilers (Hugs? York? Nottingham? Yale? Utrecht?) this particular approach to garbage collection was actually being used.)

2 Likes

Various papers about implementation matters can be found here:

Research papers/Runtime systems - HaskellWiki

1 Like

Exactly that wording is in the v1.0 Haskell report, April 1990. At that time, not all hardwares used a word size that was some power of two.

That 1990 section goes on

Float is a floating-point type, also implementation-defined; it is desirable that this type be at least equal in range and precision to the IEEE single precision type.

So both stipulations are there to assure programmers of scientific applications Haskell’s behaviour conforms with what they’re used to in Fortran.

None of those hypotheses make sense to me. Why would the implementation put tagging inside an Int? Tag pointers, certainly. Make your pointers point to blocks of 4 words so the low-order two bits aren’t needed so can carry tags, OK. But the low-order two bits of an Int certainly can’t be abused like that. And that sort of abuse doesn’t extend to Floats, Doubles or Chars.

Neither will that abuse be safe passing an Int to a FFI or returning Int to a caller using a Haskell function from some other language. Haskell is all about type safety.

1 Like

It’s difficult to answer where those two bits “are”, because the Haskell language spec is just a specification, not an implementation. Clearly it was written with some kind of tagged integer representation in mind, but at least as far as GHC is concerned, these tag bits inside Int representation are not used. I know that, for example, the Lean theorem prover does use such tags to eliminate Int boxes, but GHC does not. It has other, more general means to get rid of those boxes that don’t require special casing for Ints.

GHC also makes use of pointer tagging, but for a different mechanism: Encoding evaluatedness (your “Laziness tagging”) using the ordinal of the data constructor (+1) inside the tag. See pointer tagging · Wiki · Glasgow Haskell Compiler / GHC · GitLab.

This pointer tagging mechanism is different to the tagging mechanism that is avoided in “tagless” STG; note that all heap objects have a uniform representation with something akin to runtime type information/virtual table that is accessed by indirection. Indirections are costly and GHC uses pointer tagging to avoid following those indirections in many cases by looking at the pointer tag.

In fact, for some heap objects GHC wants to adopt an “always tagged” invariant, in which case some entry code can be dropped altoghether/simply lead to a crash, further subverting the “tagless” point that is part of STG.


Don’t think about the meaning of the acronym STG, every part of it is just confusing and irrelevant. STG is just (one of?) the first instance of a practical useful kind of intermediate representation for functional languages that became standard. The equivalent in OCaml would be flambda, I think: It is in ANF (the property of a program where every non-variable argument is let-bound thus given a name) as well. Similarly, Lean has an IR called “IR”; in ANF as well, and supporting saturated fast calls as well as unsaturated slow calls. The concepts are the same, the names are different.

Why was STG the first? I think because ANF is a necessary requirement to compile lazy programs, plus Simon identified that it’s important to have a way to handle currying/bulk applications efficiently (previous calculi only apply one arg at a time). Compilation of call-by-value languages was comparatively simple, but ultimately all optimising compilers want to cope with currying well and turning your program into ANF seems to be the only way to do so without losing your mind.

4 Likes

…and probably as a result, an alternative definition (from around the turn of the millennium, as I dimly recall) to search for is Shared Term Graph machine e.g:

https://hsyl20.fr/home/files/papers/2022-ghc-modularity.pdf

But most articles and papers still refer to the “older” definition.


At various times there also was interest in a intermediate language for both Haskell and Standard ML:

3 Likes

“Shared term graph”

Interesting … reinterpretation of the acronym. I’m not convinced we need the “graph” either (I don’t think that graph reduction is the most intuitive model to explain laziness, especially since https://dl.acm.org/doi/10.1145/199448.199507), but don’t have a replacement to offer.

At various times there also was interest in a intermediate language for both Haskell and Standard ML:

Both of these references are very interesting, thank you!

I think that for a pure variant of ML (“Strict Haskell” with different syntax, or Purescript/Idris), in large parts this language could be just STG. Nonrec let is simply Case, letrec is STG’s Let (Rec ...). Note that ML’s value restriction on letrec means that we never have to ask “is this let binding strict or lazy?”. Of course, the '98 paper is concerned with side-effects in SML, which warrants a more first class integration of commands. But even newer strict languages try to isolate side-effects into certain type constructors, so I’m not sure that it’s worth having a separate language-level concept.

As it happens, we were just discussing today (for a repeated time) how to equip STG with an “ABI” that allows us to have first class call-by-value functions: #24136: Convey ABI/calling-convention between modules (even with -O0) · Issues · Glasgow Haskell Compiler / GHC · GitLab.
The goal is to form the bedrock for a truly levity polymorphic code generation pipeline, where you can write strict programs just as well as lazy programs, with the stretch goal of re-using inherently strict functions such as foldl in the 2006 paper in both flavours. Still a lot of engineering to do here.

5 Likes

Have I got a deal for you!

(Alas, the acronym “STM” has already been taken…)


Of course, the '98 paper is concerned with side-effects in SML, which warrants a more first class integration of commands.

Not necessarily - read section 5.1 (First-order versus higher-order, page 29 of 33) in How to Declare an Imperative (1997). But failing that, there’s the evidence-passing style as used by Linear Haskell:


As it happens, we were just discussing today [again! ] how to equip STG with an “ABI” that allows us to have first class call-by-value functions:

That’s odd: from Making a Fast Curry (2004), I thought the STG language was already strict.

Edit: from that paper, I presumed an implementation of the STG language in another strict language would be a simple matter of replacing the local bindings v = e in each Let-expressions with calls to e.g. v = halloc(e); with the result of the Let-expression immediately following:

Let b1 = e1; ... bN = eN; in result 

therefore being expanded to:

b1 = halloc(e1);
     ⋮ 
bN = halloc(eN);
{- code
    for
    result -};

So in that very specific sense, there’s already a strict variant of the STG language lurking in the original one - it just needs to be “exposed”.

Perhaps the “strict Core” variant devised in Types are Calling Conventions (2009) could be of some use, despite it being (apparently) superseded


Both of these references are very interesting, thank you! […]

…getting back on-topic: were Standard ML’s int values also similarly limited in size back then (1990-1997)? If so, the appropriate standard reference for that language ought to have the answer to the original question.

1 Like

Neither in Hugs 2006. Int's range is [-2^31 , 2^31 - 1] – on a Windows machine.

1 Like

I think Yale Haskell might have inherited some limitations around integers from Common Lisp. There’s some info about the implementation of CMU Common Lisp here:

https://cmucl.org/docs/internals/html/Fixnums.html

40.4 Fixnums

A fixnum has one of the following formats in 32 bits:

-------------------------------------------------------
|        30 bit 2's complement even integer   | 0 0 0 |
-------------------------------------------------------

or

-------------------------------------------------------
|        30 bit 2's complement odd integer    | 1 0 0 |
-------------------------------------------------------

Effectively, there is one tag for immediate integers, two zeros. This buys one more bit for fixnums, and now when these numbers index into simple-vectors or offset into memory, they point to word boundaries on 32-bit, byte-addressable machines. That is, no shifting need occur to use the number directly as an offset.

This format has another advantage on byte-addressable machines when fixnums are offsets into vector-like data-blocks, including structures. Even though we previously mentioned data-blocks are dual-word aligned, most indexing and slot accessing is word aligned, and so are fixnums with effectively two tag bits.

Two tags also allow better usage of special instructions on some machines that can deal with two low-tag bits but not three.

Since the two bits are zeros, we avoid having to mask them off before using the words for arithmetic, but division and multiplication require special shifting.

2 Likes

Thanks, But, but! A type that represents “offset into memory” isn’t an (Haskell) Int. (That is, they might both use the same memory representation – that is, a full word – but the type system is all about the logical properties of a type, not its representation.)

The “index into simple-vectors” chimes with what the Haskell report 1.0 1990 goes on to say following @qqwy’s quote:

The range chosen [for Int] by an implementation must … and should be large enough to serve as array indices.

That talk of indices is no longer in the H2010 Report. And neither should it be: you can index your array by any type in Ix, I would have thought(?) (Class Ix is spec’d in the v1.0 Report.)

I’m saying the reason for those two missing bits is because the corresponding common lisp type, fixnum, is also missing two bits. And that in turn is because common lisp needs to retain all type information at run time. CMU Common Lisp has 8 (3-bit) primitive types and two of those are used for fixnum. The reason for choosing the specific tags 100 and 000 for fixnum is because that choice allows for faster array indexing.

While I’ve used it in the GHC modularity paper, I haven’t invented the different meaning for the STG acronym. Sadly I don’t remember exactly where I’ve read that STG stood for “Shared Term Graph” when talking about the language, perhaps in http://ac.inf.elte.hu/Vol_036_2012/117_36.pdf but I’m not sure.

o_0 …er, yes: I’ve assumed that the alternate definition, while not common knowledge, was known by some to be pre-existing; post updated.

After rummaging though some of my old archives…it’s earliest appearance I can find is in GHC 0.29 - from ghc/compiler/stgSyn/StgSyn.lhs:


%
% (c) The GRASP/AQUA Project, Glasgow University, 1992-1995
%
\section[StgSyn]{Shared term graph (STG) syntax for spineless-tagless code generation}

Since then, that definition just languished in obscurity and continues to do so…

1 Like