The CLC is in a difficult position:
base is dilapidated, causing harm, yet breaking changes to
base that make it better also cause harm.
What most the discourse prompted by individual proposed breaking changes misses is that there might not be a good one-sized-fits all rate of breakage acceptable to everyone — and if the magic rate doesn’t exist we should stop spilling ink on what it is. But if we default to do nothing absent any consensus, the problem will only worsen as the most prominent and easily accessed definitions available to Haskellers are increasingly a minefield to carefully step over.
The good news is that there don’t need to be! We should instead have these goals:
- It is possible to upgrade across a GHC breaking change without undergoing a breaking change in
- It is possible to upgrade across a breaking change in
basewithout a undergoing change in GHC.
We can achieve (and make more concrete) these goals with a generalization of our good old 3 release policy:
- Every major version of
baseshould support 3 adjacent major versions of GHC
- Every major version of GHC should support 3 adjacent major versions of
With this in place, everyone should be happy: the “conservative” faction gets ample warning and isn’t stuck on old GHCs, while the “revisionist” faction is free to make as many breaking changes to GHC as they like, provided they are willing to maintain all those new resulting major versions of
base over 3 versions concurrently! (Per @Gabriella439’s favorite point, we should limit the number of breaking changes per major release even if it means adjacent breaking releases. In this way, the “revisionist” faction can’t skirt the maintenance obligations by doing one super-breaking release.)
Informally talking to people (I do bring this up a lot :)), I never met anyone against this plan. Rather I get shrugs that is been tried before, and stalled out.
I admit I don’t fully get it — to me decoupling base might be a bitter pill to swallow, but is an absolutely necessary one, for the fate of most languages is for they and their standard library to ossify until the community drifts off, and a new one around a new language forms in its stead. Haskell is far from alone in having a bad standard library, so let’s not act like the current situation is somehow exceptional.
Still, the decoupling a is a decent amount of work, and, perhaps more importantly, a project a long runway before we get any payout: we do a bunch of boring work ripping two things apart in an way that is probably ugly initially, and only then we get to make fun breaking changes.
@Kleidukos and I have discussed spending our own time on the first step below, but I unless this goes dramatically better than I expect (and past “break up base” attempts did) I suspect we will stall out at in step 2 in the plan below, and certainly before the interface between the two is terribly satisfactory.
In the event we do not completely succeed however, this is just the sort of high-value but un-fun task that the Haskell Foundation should be responsible for, I think. It even serves as a a sort of principle–agent problem between CLC and GHC worlds, in that the GHC world isn’t terribly incentivized to do the boring work to make yet another annoying submodule that the CLC desperately needs but don’t have much dev capacity to do itself. Exactly this situation is when we need a “higher body” like the HF to step in and break the coordination failure.
I am therefore also opening this thread in anticipation of there needing to be a HFTT proposal, for which the first step is a thread like this.
OK, let’s get down to actual concrete steps, because what this area lacks is planning, not pontificating by me :).
This is what @Kleidukos wrote in [RFC] Split the GHC.* hierarchy from `base` in its own package, `ghc-base` (#20647) · Issues · Glasgow Haskell Compiler / GHC · GitLab, where it is proposed to take the
GHC.* modules and put them in a separate library.
Note that the motivation is somewhat different than what I wrote here: it’s the
GHC.* wild west that is largely effectively "
*.Internal modules, except not officially so, and thus guaranteeing that the
base major version changes every every major GHC release. By doing the split, we unlock the immediate benefit that a more stable
base is at least possible — for recall that the motivation is such that we want base to be simultaneously more and less stable.
(An unresolved question is whether
base should immediately shed the
GHC.* modules entirely, or rather remain some sort of legacy shim reexporting other libraries that people are encouraged to use instead.)
The end goal of this step should be something like this:
It’s relatively easy to split a library when module imports are already acyclic (which is not the case with
base thanks to
hs-boot files, mua ha ha, but I digress). The harder part is figuring out whether the split is well done. Remember, the point is to end up with
base that is decoupled from GHC, i.e. a base that isn’t full of implementation details liable to change.
We shouldn’t just wait and see what the next GHC looks like, but we can look at the previous GHC. We should repeat step 1 on the previous 1 or 2 versions of GHC:
Furthermore, we should ensure that our new base can work with older GHCs!
ghc-modules works with the
base it was split from, and the latest
This is a lot of work, yes, but we want to make sure we are doing something that actually solves the probem we face. We will probably find issues, and then go back adjust step 1, hewing the cleft a bit differently. We will probably also have multiple round of this.
While I do expect this cleaving of base to work, I don’t expect it to be pretty. In particular, many seemingly compiler agnostic pure datatypes and classes need to be defined deep in the
GHC.* hierarchy in order avoid orphans and other issues. This means two things:
ghc-moduleswill have too much GHC-agnostic stuff to give
basethe freedom to evolve it’s supposed to have, or vice versa:
basewill have too much GHC-specific stuff.
Coping with the above when we make
basework with multiple GHCs will involve an unsightly amount of CPP in either or both libraries. (
basewould have lots of
ghc-moduleswould need flags or similar to change things on behalf of
This will be gross, and people will be disappointed: what did we do all that work for anyways just to live in CPP hell? Nevertheless, I think it will be an accomplishment:
We will know what the pain points actually are, as opposed to do today where we simply live in fear of the unknown. And at least they are not as bad as they used to be, when
(/)→ exceptions →
IOinfamously tangled everything.
We will have a beachhead. With an initial split made, the community will be able to partake in the rather more fun and immediately rewarding task of massaging the boundary, moving things to the right side were possible. That means the HF (if it has, say, been pushing the project through the dark parts of step 2) can take a step back and let the work that’s been done marinate a bit in the delicious sauce of small individual contributions. (It will be much easier to refine the split in small PRs that scratch an itch.)
Already, if not as much as we want, the CPP pain in
ghc-moduleswill be pain that is shifted there from individual library authors. We want to push that burden as far upstream as possible, both to prevent the need for duplicate CPP to shoulder the burden/risk with the projects on bodies (
base, GHC, HF) best able to bare the burden, freeing up everyone else.
At this point, the project will move back into planning phase, as we decide what, if anything, needs to done.
As I have written, I expect the decoupling to be messy. It will probably remind us of the present state of GNU libc, which faces a similar problem with many different syscall ABIs and few good abstractions. Still, I do think it will be good enough to buy us some time and enjoy the taste of the low-hanging fruit that has been plucked, but the CPP and flags maintainence burden is not something we will want to live with.
There are two tracks which I think we will want to pursue for a better decoupling:
Pop quiz, where is this defined?
data  a =  | a : [a]
No it’s not in
base, but in
ghc-prim, a place so deep in the bowels I dar’st not mention it again after this paragraph. Also, half the code in
ghc-prim just exists for Haddock, and isn’t even real so who knows maybe it isn’t defined there either.
That means good old GHC-agnostic, hell, implementation-agnostic,
List has in fact no chance of making on the
base side of
base cleft, the side it ought to be on. Bummer. At least this is a part of
base's public interface that doesn’t need changing!
Still though, this is meant to illustrate an example of all the innocent pure code stuck in in the implementation specific layers. We can’t easily fix that, but we can at least take it’s reflection and put it somewhere better: we can put the definition in a backpack signature “above the cleft” in the GHC-agnostic libraries, and yet both instantiate (provide the implementation of) and import that signature in code below the cleft.
In this way, we can cut the Gordian not, separating:
- Definitions trapped in implementation-specific code
- Uses in definitions that are morally implementation-agnostic
- Implementation-specific uses of those morally-implementation-agnostic uses.
With those all separated, we nicely handle dependency chains that criss-cross the implementation-dependent-agnostic divide arbitrarily many times.
This risk here is, of course, that backpack isn’t much used, and we will probably need to fix a bunch of bugs in order to make wired-in items (that the compiler itself wants) not break with it.
The general prohibition against orphans tends to linearize dependencies. For example, we might have a package or module chain like:
- A datatype, and a class
- Some new datatypes, and classes the new and old dependencies implement
- Yet more new datatypes, and classes the new and old dependencies implement
Every step of the way, the burden of writing instances gets worse as more instances have to be written, and have to be written right there. While the specific order of items can sometimes be swapped, the need for a mostly total ordering is unavoidable.
This is the inevitable looming problem which will grind Hackage to a fault, scary but a ways off like the heat death of the universe. But that’s not the reason I bring it up. Rather, it’s that
base arguably is a microcosm of Hackage has a whole with respect to this disease, exhibiting the the advanced symptoms now far before the rest of the ecosystem eventually will. Concretely, many of the classes or datatypes buried deep in
GHC.* modules must be there for reasons of avoiding orphans. See the comment at the top of
libraries/base/GHC/Base.hs for an explanation of one prominent example of this.
I wrote Rehabilitating Orphans with Order theory · Wiki · Glasgow Haskell Compiler / GHC · GitLab about this problem, with the vague idea that since something like this ultimately probably necessary to solve the issues with
base, we better get started and clean it up first.
But novel academic research is too risky for even the HF, and currently I don’t see much ongoing academic interest in pursuing things like this, with the “expression problem” strain of research dying down in recent years. This convinced me that the incremental path whose trail head @Kleidukos found with the simple low-tech split is the right approach.
If we do everything else and find the orphan problem isn’t as bad as I thought, excellent! Conversely, if it is, we will have at least brought great attention to the issue, which will hopefully inspire the researchers in our midst to take another crack at it.
With both of the above implemented, we not only can decouple
GHC, but also break up the
liquid-base could become more than a second-class “alternative prelude”-type projects, as could a hypothetical
proven-base for Dependent Haskell,
With the fundamental change-vs-stasis question bypassed, I wouldn’t be surprised if there is still much acrimony over what direction to take
base. That’s great. With
base split up, we can let different directions try themselves out before making a decision about what shim-
base might reexport.
base is probably still a good idea to codify decisions by the CLC, though.)
Just in case this sounds all very difficult and slow, let me remark a bit on the goals and their hopeful effect.
With the ability to make more breaking changes and yet also force fewer dealing with them less at breakage time, I think Haskell will be well-positioned to undergo a renaissance of sorts. Having the state of
Prelude up for renegotiation without the bit-rot of all that exists as collateral, I think we can not only present the Haskell we want to exist to the world to a better extent, but also involve the community better in what would be an exciting time to dust off and reinvent the old.
We can go from being a language to shows its age in its standard library (even as it hides it in the language itself with all those dank features), to having the most nimble standard library of them all, and thus the best first impression on new aspiring Haskellers.