In the Haskell Report we can find:
The finite-precision integer type Int covers at least
the range[−2^29 , 2^29−1]
.
Which is very curious, because on a 32-bit platform it is more common to use one bit for the sign and the other 31 bits for the number, resulting in the range [-2^31..2^31-1]
. I am not aware of any other language that uses the 30-bit range.
Also, it is quite easy to see using the methods of the Bounded
typeclass that in modern-day GHC programs, the actual range of Int
is the full 32-bit resp. 64-bit range.
Which begs the question: What historic reason is there for this peculiar range? Why are there two bits ‘missing’?
Now I think I have an answer. However, it is very difficult to find resources about early Haskell development or about the internals of early GHC and other (pre-GHC) Haskell compilers. If someone who has been around in the Haskell community for longer can corroborate this I would be very happy !
1) Laziness tagging
To work with lazy values, you need some way to disambiguate not-yet-evaluated thunks from already-evaluated results.
A very simple way to do this, is to use one bit in each pointer as a boolean ‘is this evaluated already?’ tag. (You can do this because on essentially all hardware a pointer always has to be word-aligned which means that the last few bits end up being unused.)
The big drawback of this approach however, is that everywhere in your code you will now need to check on this tag.
This approach was used by the ‘Lazy Abstract Machine’ (LAM), which was used in GHC before the invention of the Spineless Tagless G-machine (STG).
The STG improved on this, by instead of needing a bit in the pointer, always store a callable closure. It’s just that after evaluating a thunk for the first time, its closure will be replaced with a closure which just immediately returns the value. (In very broad strokes)
This is why the STG is ‘Tagless’.
2) Garbage collection
The other bit might have been used by the garbage collector. In the past (again, before the STG), there was no special support for ‘fully unboxed’ values. But obviously Haskell did still have its primitive types which needed to be stored and manipulated in some way. So an Int
would just be a value which was stored on the stack. As mentioned above, one bit might have been set if it was still a pointer to a thunk, but if it was already evaluated, it would result in a plain value being present on the stack.
This poses a problem for the garbage collector: All pointer-values on the stack need to be considered as ‘live roots’ for the starting point of the collection process. But if we are mixing pointers and plain values on the stack, we need some way for the GC to know which is which!
I expect that the other bit of Int
was used in the past for this reason. ‘Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine’ mentions this GC technique and that the STG wants to avoid it (Section 8.1).
(NOTE: I have not found concrete proof so far however that in the first versions of GHC before the STG or possibly in other Haskell Compilers (Hugs? York? Nottingham? Yale? Utrecht?) this particular approach to garbage collection was actually being used.)