Problem
The CLC is in a difficult position: base
is dilapidated, causing harm, yet breaking changes to base
that make it better also cause harm.
What most the discourse prompted by individual proposed breaking changes misses is that there might not be a good one-sized-fits all rate of breakage acceptable to everyone — and if the magic rate doesn’t exist we should stop spilling ink on what it is. But if we default to do nothing absent any consensus, the problem will only worsen as the most prominent and easily accessed definitions available to Haskellers are increasingly a minefield to carefully step over.
Goals
The good news is that there don’t need to be! We should instead have these goals:
- It is possible to upgrade across a GHC breaking change without undergoing a breaking change in
base
- It is possible to upgrade across a breaking change in
base
without a undergoing change in GHC.
We can achieve (and make more concrete) these goals with a generalization of our good old 3 release policy:
- Every major version of
base
should support 3 adjacent major versions of GHC - Every major version of GHC should support 3 adjacent major versions of
base
With this in place, everyone should be happy: the “conservative” faction gets ample warning and isn’t stuck on old GHCs, while the “revisionist” faction is free to make as many breaking changes to GHC as they like, provided they are willing to maintain all those new resulting major versions of base
over 3 versions concurrently! (Per @Gabriella439’s favorite point, we should limit the number of breaking changes per major release even if it means adjacent breaking releases. In this way, the “revisionist” faction can’t skirt the maintenance obligations by doing one super-breaking release.)
Why hasn’t this already happened?
Informally talking to people (I do bring this up a lot :)), I never met anyone against this plan. Rather I get shrugs that is been tried before, and stalled out.
I admit I don’t fully get it — to me decoupling base might be a bitter pill to swallow, but is an absolutely necessary one, for the fate of most languages is for they and their standard library to ossify until the community drifts off, and a new one around a new language forms in its stead. Haskell is far from alone in having a bad standard library, so let’s not act like the current situation is somehow exceptional.
Still, the decoupling a is a decent amount of work, and, perhaps more importantly, a project a long runway before we get any payout: we do a bunch of boring work ripping two things apart in an way that is probably ugly initially, and only then we get to make fun breaking changes.
@Kleidukos and I have discussed spending our own time on the first step below, but I unless this goes dramatically better than I expect (and past “break up base” attempts did) I suspect we will stall out at in step 2 in the plan below, and certainly before the interface between the two is terribly satisfactory.
In the event we do not completely succeed however, this is just the sort of high-value but un-fun task that the Haskell Foundation should be responsible for, I think. It even serves as a a sort of principle–agent problem between CLC and GHC worlds, in that the GHC world isn’t terribly incentivized to do the boring work to make yet another annoying submodule that the CLC desperately needs but don’t have much dev capacity to do itself. Exactly this situation is when we need a “higher body” like the HF to step in and break the coordination failure.
I am therefore also opening this thread in anticipation of there needing to be a HFTT proposal, for which the first step is a thread like this.
Concrete plan
OK, let’s get down to actual concrete steps, because what this area lacks is planning, not pontificating by me :).
1. Cleave base in two
This is what @Kleidukos wrote in [RFC] Split the GHC.* hierarchy from `base` in its own package, `ghc-base` (#20647) · Issues · Glasgow Haskell Compiler / GHC · GitLab, where it is proposed to take the GHC.*
modules and put them in a separate library.
Note that the motivation is somewhat different than what I wrote here: it’s the GHC.*
wild west that is largely effectively "*.Internal
modules, except not officially so, and thus guaranteeing that the base
major version changes every every major GHC release. By doing the split, we unlock the immediate benefit that a more stable base
is at least possible — for recall that the motivation is such that we want base to be simultaneously more and less stable.
(An unresolved question is whether base
should immediately shed the GHC.*
modules entirely, or rather remain some sort of legacy shim reexporting other libraries that people are encouraged to use instead.)
The end goal of this step should be something like this:
-
ghc-modules-9.3.a
→base-17.y
2. Prove the split works at all
It’s relatively easy to split a library when module imports are already acyclic (which is not the case with base
thanks to hs-boot
files, mua ha ha, but I digress). The harder part is figuring out whether the split is well done. Remember, the point is to end up with base
that is decoupled from GHC, i.e. a base that isn’t full of implementation details liable to change.
We shouldn’t just wait and see what the next GHC looks like, but we can look at the previous GHC. We should repeat step 1 on the previous 1 or 2 versions of GHC:
-
ghc-modules-9.0.a
→base-14.x
-
ghc-modules-9.2.b
→base-16.y
-
ghc-modules-9.3.c
→base-17.z
Furthermore, we should ensure that our new base can work with older GHCs!
-
ghc-modules-9.0.a
→base-17.z
-
ghc-modules-9.2.b
→base-17.z
-
ghc-modules-9.3.c
→base-17.z
So every ghc-modules
works with the base
it was split from, and the latest base
.
This is a lot of work, yes, but we want to make sure we are doing something that actually solves the probem we face. We will probably find issues, and then go back adjust step 1, hewing the cleft a bit differently. We will probably also have multiple round of this.
3. Asses the quality of the interface
While I do expect this cleaving of base to work, I don’t expect it to be pretty. In particular, many seemingly compiler agnostic pure datatypes and classes need to be defined deep in the GHC.*
hierarchy in order avoid orphans and other issues. This means two things:
-
Either
ghc-modules
will have too much GHC-agnostic stuff to givebase
the freedom to evolve it’s supposed to have, or vice versa:base
will have too much GHC-specific stuff. -
Coping with the above when we make
base
work with multiple GHCs will involve an unsightly amount of CPP in either or both libraries. (base
would have lots ofMIN_VERSION_ghc_modules
,ghc-modules
would need flags or similar to change things on behalf ofbase
.)
This will be gross, and people will be disappointed: what did we do all that work for anyways just to live in CPP hell? Nevertheless, I think it will be an accomplishment:
-
We will know what the pain points actually are, as opposed to do today where we simply live in fear of the unknown. And at least they are not as bad as they used to be, when
Monad
→fail
→String
→Char
→(/)
→ exceptions →IO
infamously tangled everything. -
We will have a beachhead. With an initial split made, the community will be able to partake in the rather more fun and immediately rewarding task of massaging the boundary, moving things to the right side were possible. That means the HF (if it has, say, been pushing the project through the dark parts of step 2) can take a step back and let the work that’s been done marinate a bit in the delicious sauce of small individual contributions. (It will be much easier to refine the split in small PRs that scratch an itch.)
-
Already, if not as much as we want, the CPP pain in
base
andghc-modules
will be pain that is shifted there from individual library authors. We want to push that burden as far upstream as possible, both to prevent the need for duplicate CPP to shoulder the burden/risk with the projects on bodies (base
, GHC, HF) best able to bare the burden, freeing up everyone else.
At this point, the project will move back into planning phase, as we decide what, if anything, needs to done.
4. Planning further improving the quality of the interface with Advanced Methods
As I have written, I expect the decoupling to be messy. It will probably remind us of the present state of GNU libc, which faces a similar problem with many different syscall ABIs and few good abstractions. Still, I do think it will be good enough to buy us some time and enjoy the taste of the low-hanging fruit that has been plucked, but the CPP and flags maintainence burden is not something we will want to live with.
There are two tracks which I think we will want to pursue for a better decoupling:
Backpack
Pop quiz, where is this defined?
data [] a = [] | a : [a]
No it’s not in base
, but in ghc-prim
, a place so deep in the bowels I dar’st not mention it again after this paragraph. Also, half the code in ghc-prim
just exists for Haddock, and isn’t even real so who knows maybe it isn’t defined there either.
That means good old GHC-agnostic, hell, implementation-agnostic, List
has in fact no chance of making on the base
side of ghc-modules
–base
cleft, the side it ought to be on. Bummer. At least this is a part of base
's public interface that doesn’t need changing!
Still though, this is meant to illustrate an example of all the innocent pure code stuck in in the implementation specific layers. We can’t easily fix that, but we can at least take it’s reflection and put it somewhere better: we can put the definition in a backpack signature “above the cleft” in the GHC-agnostic libraries, and yet both instantiate (provide the implementation of) and import that signature in code below the cleft.
In this way, we can cut the Gordian not, separating:
- Definitions trapped in implementation-specific code
- Uses in definitions that are morally implementation-agnostic
- Implementation-specific uses of those morally-implementation-agnostic uses.
- …
With those all separated, we nicely handle dependency chains that criss-cross the implementation-dependent-agnostic divide arbitrarily many times.
This risk here is, of course, that backpack isn’t much used, and we will probably need to fix a bunch of bugs in order to make wired-in items (that the compiler itself wants) not break with it.
Worry-free orphans
The general prohibition against orphans tends to linearize dependencies. For example, we might have a package or module chain like:
- A datatype, and a class
- Some new datatypes, and classes the new and old dependencies implement
- Yet more new datatypes, and classes the new and old dependencies implement
- …
Every step of the way, the burden of writing instances gets worse as more instances have to be written, and have to be written right there. While the specific order of items can sometimes be swapped, the need for a mostly total ordering is unavoidable.
This is the inevitable looming problem which will grind Hackage to a fault, scary but a ways off like the heat death of the universe. But that’s not the reason I bring it up. Rather, it’s that base
arguably is a microcosm of Hackage as a whole with respect to this disease, exhibiting the advanced symptoms now, far before the rest of the ecosystem eventually does. Concretely, many of the classes or datatypes buried deep in GHC.*
modules must be there for reasons of avoiding orphans. See the comment at the top of libraries/base/GHC/Base.hs
for an explanation of one prominent example of this.
I wrote Rehabilitating Orphans with Order theory · Wiki · Glasgow Haskell Compiler / GHC · GitLab about this problem, with the vague idea that since something like this ultimately probably necessary to solve the issues with base
, we better get started and clean it up first.
But novel academic research is too risky for even the HF, and currently I don’t see much ongoing academic interest in pursuing things like this, with the “expression problem” strain of research dying down in recent years. This convinced me that the incremental path whose trail head @Kleidukos found with the simple low-tech split is the right approach.
If we do everything else and find the orphan problem isn’t as bad as I thought, excellent! Conversely, if it is, we will have at least brought great attention to the issue, which will hopefully inspire the researchers in our midst to take another crack at it.
Finer-grained libraries, break up base
With both of the above implemented, we not only can decouple base
from GHC
, but also break up the base
further.
linear-base
and liquid-base
could become more than a second-class “alternative prelude”-type projects, as could a hypothetical proven-base
for Dependent Haskell, strict-base
, unboxed-base
, etc.
With the fundamental change-vs-stasis question bypassed, I wouldn’t be surprised if there is still much acrimony over what direction to take base
. That’s great. With base
split up, we can let different directions try themselves out before making a decision about what shim-base
might reexport.
(A shim-base
is probably still a good idea to codify decisions by the CLC, though.)
Outcomes
Just in case this sounds all very difficult and slow, let me remark a bit on the goals and their hopeful effect.
With the ability to make more breaking changes and yet also force fewer dealing with them less at breakage time, I think Haskell will be well-positioned to undergo a renaissance of sorts. Having the state of base
and Prelude
up for renegotiation without the bit-rot of all that exists as collateral, I think we can not only present the Haskell we want to exist to the world to a better extent, but also involve the community better in what would be an exciting time to dust off and reinvent the old.
We can go from being a language to shows its age in its standard library (even as it hides it in the language itself with all those dank features), to having the most nimble standard library of them all, and thus the best first impression on new aspiring Haskellers.