There has been much talk of the challenge of emancipating base
from the GHC release process over the years. Challenges have been listed. I have not followed these conversations deeply, but I respect those that have written how hard it would be. In this post, I describe a different way to structure the relationship between GHC and base
than we have today. I don’t have an opinion on whether this is a better way than the status quo, and I’m not advocating for change here; instead, I’m laying out an alternative design so that more-informed others can consider it and decide whether it is in fact better than the status quo.
Assumption
The key challenge of emancipating base
is that GHC has special treatment of some of the definitions in base
, notably Monad
and friends (which GHC must know about in order to compile do
blocks) and Num
and friends (which GHC must know about in order to compile numeric literals). Because GHC knows about these definitions, the modules that define them must be shipped with GHC itself, causing a key challenge in emancipating (the rest of) base
. This post is all about defining a different potential relationship between e.g. the definition of Monad
and GHC.
Background
GHC currently is aware of at least three categories of definitions in libraries:
-
Primitive types. There are a number of type definitions that have no phrasing in Haskell. A good example is
Int#
. GHC gives these types meaning with a little magic in the compiler (that is, they are treated in a different way from ordinary types), and they are exported from the magic moduleGHC.Prim
. TheGHC.Prim
module has no source code (in its place is a stub just used for creating Haddock output) and lives in the special packageghc-prim
. GHC knows that whenGHC.Prim.Int#
is looked up, it should provide its internal magical type definition; the type is identified by the nameGHC.Prim.Int#
. -
Primitive values. There are a number of primitive values, like
negateInt#
, that cannot be written in Haskell. For the purposes of this post, these are treated identically to primitive types, in that they are exported fromGHC.Prim
and are identified by their name (e.g.GHC.Prim.negateInt#
). The GHC implementation routes these through quite a different path than primitive types, but the differences don’t matter here. -
Wired-in types. There are a number of Haskell type definitions that GHC knows intimately. A good example is
Bool
. GHC knows whereBool
is declared, that it has exactly two constructorsFalse
andTrue
, and what order these constructors are declared in. It must know this in order to compile guards andif
expressions. Most of these live in the moduleGHC.Types
in theghc-prim
package; some live elsewhere in theghc-prim
package. None lives inbase
. When GHC compiles the actual definition of these types (GHC.Types
does have source code), it essentially ignores the definition provided and uses its own internal definition instead. These types are identified by their name (e.g.GHC.Types.Bool
). -
Known-key types and values. There are a number of Haskell type and value definitions that GHC knows how to find. A good example is
Monad
. GHC knows the package, module, and name ofMonad
(to wit,base
,GHC.Base
, andMonad
). When it needs to emit, say, aMonad m
constraint, it looks up the type with that name from that module and proceeds. These definitions might appear outside ofghc-prim
and pose the challenge: we cannot today have it so that these types can get released separately from GHC.
The challenge is all around these known-key definitions. I don’t think anyone is talking about making changes to the treatment or packaging of any of the first three cases – just known-key types cause pain.
“Don’t call us; we’ll call you!”
The key challenge here is simply that GHC knows exactly where the known-key definitions are declared, and thus the package including these must be released with GHC. My idea here is simply not to do this, but instead to have the types declare themselves to be e.g. the “real” Monad
. This is what Agda does, for example, with its BUILTIN
directive and how OCaml works with its external
declarations.
Concretely, we might imagine a declaration like
class {-# BUILTIN Monad #-} Applicative m => Monad m where ...
where the pragma tells GHC that this definition is the Monad
definition. When GHC reads this definition, it remembers that the class being defined is the Monad
, and so when it needs to produce a Monad m
constraint, it knows where to go. This approach means that base
is no longer special – instead, it just means that there must be one class definition labeled as Monad
loaded when compiling any module that uses do
. base
can evolve independently from GHC – or community members could write completely alternative standard libraries, as long as they have a Monad
class.
We could imagine an even-more Haskelly approach of having class Builtin (name :: Symbol) (ty :: k)
and the definition of Monad
would come with instance Builtin "Monad" Monad
. The class-instance solver would then be used to find the Monad
class. This would likely be considerably less performant.
This route also interacts nicely with -XRebindableSyntax
: instead of looking for one definition with BUILTIN Monad
, GHC could look for the in-scope type with BUILTIN REBINDABLE Monad
. This would likely be considerably easier to configure than the current -XRebindableSyntax
.
Left unsolved: what to do with a NoImplicitPrelude
module that doesn’t import anything. In this case, no other modules are loaded, and so GHC doesn’t have a chance to find the BUILTIN Monad
. I actually think it’s reasonable to error in such a file if it uses a do
, but others may feel differently (and this would not be backward compatible). We could alternatively say that NoImplicitPrelude
really means import "base" GHC.Required ()
or something, which looks for a package named base
and a module named GHC.Required
, loading the definitions therein. Then we would make sure that base
had such a module, and that it depended on all the modules with BUILTIN
s. But the base
could still evolve independently from GHC.
Conclusion
Perhaps I’m addressing the wrong problems here, and perhaps there are other unseen challenges in this approach. But it might be that this idea – changing GHC instead of base
– allows things to move forward in this space.