Pre-Pre-HFTP: Decoupling base and GHC

Problem

The CLC is in a difficult position: base is dilapidated, causing harm, yet breaking changes to base that make it better also cause harm.

What most the discourse prompted by individual proposed breaking changes misses is that there might not be a good one-sized-fits all rate of breakage acceptable to everyone — and if the magic rate doesn’t exist we should stop spilling ink on what it is. But if we default to do nothing absent any consensus, the problem will only worsen as the most prominent and easily accessed definitions available to Haskellers are increasingly a minefield to carefully step over.

Goals

The good news is that there don’t need to be! We should instead have these goals:

  • It is possible to upgrade across a GHC breaking change without undergoing a breaking change in base
  • It is possible to upgrade across a breaking change in base without a undergoing change in GHC.

We can achieve (and make more concrete) these goals with a generalization of our good old 3 release policy:

  • Every major version of base should support 3 adjacent major versions of GHC
  • Every major version of GHC should support 3 adjacent major versions of base

With this in place, everyone should be happy: the “conservative” faction gets ample warning and isn’t stuck on old GHCs, while the “revisionist” faction is free to make as many breaking changes to GHC as they like, provided they are willing to maintain all those new resulting major versions of base over 3 versions concurrently! (Per @Gabriella439’s favorite point, we should limit the number of breaking changes per major release even if it means adjacent breaking releases. In this way, the “revisionist” faction can’t skirt the maintenance obligations by doing one super-breaking release.)

Why hasn’t this already happened?

Informally talking to people (I do bring this up a lot :)), I never met anyone against this plan. Rather I get shrugs that is been tried before, and stalled out.

I admit I don’t fully get it — to me decoupling base might be a bitter pill to swallow, but is an absolutely necessary one, for the fate of most languages is for they and their standard library to ossify until the community drifts off, and a new one around a new language forms in its stead. Haskell is far from alone in having a bad standard library, so let’s not act like the current situation is somehow exceptional.

Still, the decoupling a is a decent amount of work, and, perhaps more importantly, a project a long runway before we get any payout: we do a bunch of boring work ripping two things apart in an way that is probably ugly initially, and only then we get to make fun breaking changes.

@Kleidukos and I have discussed spending our own time on the first step below, but I unless this goes dramatically better than I expect (and past “break up base” attempts did) I suspect we will stall out at in step 2 in the plan below, and certainly before the interface between the two is terribly satisfactory.

In the event we do not completely succeed however, this is just the sort of high-value but un-fun task that the Haskell Foundation should be responsible for, I think. It even serves as a a sort of principle–agent problem between CLC and GHC worlds, in that the GHC world isn’t terribly incentivized to do the boring work to make yet another annoying submodule that the CLC desperately needs but don’t have much dev capacity to do itself. Exactly this situation is when we need a “higher body” like the HF to step in and break the coordination failure.

I am therefore also opening this thread in anticipation of there needing to be a HFTT proposal, for which the first step is a thread like this.

Concrete plan

OK, let’s get down to actual concrete steps, because what this area lacks is planning, not pontificating by me :).

1. Cleave base in two

This is what @Kleidukos wrote in [RFC] Split the GHC.* hierarchy from `base` in its own package, `ghc-base` (#20647) · Issues · Glasgow Haskell Compiler / GHC · GitLab, where it is proposed to take the GHC.* modules and put them in a separate library.

Note that the motivation is somewhat different than what I wrote here: it’s the GHC.* wild west that is largely effectively "*.Internal modules, except not officially so, and thus guaranteeing that the base major version changes every every major GHC release. By doing the split, we unlock the immediate benefit that a more stable base is at least possible — for recall that the motivation is such that we want base to be simultaneously more and less stable.

(An unresolved question is whether base should immediately shed the GHC.* modules entirely, or rather remain some sort of legacy shim reexporting other libraries that people are encouraged to use instead.)

The end goal of this step should be something like this:

  • ghc-modules-9.3.abase-17.y

2. Prove the split works at all

It’s relatively easy to split a library when module imports are already acyclic (which is not the case with base thanks to hs-boot files, mua ha ha, but I digress). The harder part is figuring out whether the split is well done. Remember, the point is to end up with base that is decoupled from GHC, i.e. a base that isn’t full of implementation details liable to change.

We shouldn’t just wait and see what the next GHC looks like, but we can look at the previous GHC. We should repeat step 1 on the previous 1 or 2 versions of GHC:

  • ghc-modules-9.0.abase-14.x
  • ghc-modules-9.2.bbase-16.y
  • ghc-modules-9.3.cbase-17.z

Furthermore, we should ensure that our new base can work with older GHCs!

  • ghc-modules-9.0.abase-17.z
  • ghc-modules-9.2.bbase-17.z
  • ghc-modules-9.3.cbase-17.z

So every ghc-modules works with the base it was split from, and the latest base.

This is a lot of work, yes, but we want to make sure we are doing something that actually solves the probem we face. We will probably find issues, and then go back adjust step 1, hewing the cleft a bit differently. We will probably also have multiple round of this.

3. Asses the quality of the interface

While I do expect this cleaving of base to work, I don’t expect it to be pretty. In particular, many seemingly compiler agnostic pure datatypes and classes need to be defined deep in the GHC.* hierarchy in order avoid orphans and other issues. This means two things:

  1. Either ghc-modules will have too much GHC-agnostic stuff to give base the freedom to evolve it’s supposed to have, or vice versa: base will have too much GHC-specific stuff.

  2. Coping with the above when we make base work with multiple GHCs will involve an unsightly amount of CPP in either or both libraries. (base would have lots of MIN_VERSION_ghc_modules, ghc-modules would need flags or similar to change things on behalf of base.)

This will be gross, and people will be disappointed: what did we do all that work for anyways just to live in CPP hell? Nevertheless, I think it will be an accomplishment:

  1. We will know what the pain points actually are, as opposed to do today where we simply live in fear of the unknown. And at least they are not as bad as they used to be, when MonadfailStringChar(/) → exceptions → IO infamously tangled everything.

  2. We will have a beachhead. With an initial split made, the community will be able to partake in the rather more fun and immediately rewarding task of massaging the boundary, moving things to the right side were possible. That means the HF (if it has, say, been pushing the project through the dark parts of step 2) can take a step back and let the work that’s been done marinate a bit in the delicious sauce of small individual contributions. (It will be much easier to refine the split in small PRs that scratch an itch.)

  3. Already, if not as much as we want, the CPP pain in base and ghc-modules will be pain that is shifted there from individual library authors. We want to push that burden as far upstream as possible, both to prevent the need for duplicate CPP to shoulder the burden/risk with the projects on bodies (base, GHC, HF) best able to bare the burden, freeing up everyone else.

At this point, the project will move back into planning phase, as we decide what, if anything, needs to done.

4. Planning further improving the quality of the interface with Advanced Methods

As I have written, I expect the decoupling to be messy. It will probably remind us of the present state of GNU libc, which faces a similar problem with many different syscall ABIs and few good abstractions. Still, I do think it will be good enough to buy us some time and enjoy the taste of the low-hanging fruit that has been plucked, but the CPP and flags maintainence burden is not something we will want to live with.

There are two tracks which I think we will want to pursue for a better decoupling:

Backpack

Pop quiz, where is this defined?

data [] a = [] | a : [a]

No it’s not in base, but in ghc-prim, a place so deep in the bowels I dar’st not mention it again after this paragraph. Also, half the code in ghc-prim just exists for Haddock, and isn’t even real so who knows maybe it isn’t defined there either.

That means good old GHC-agnostic, hell, implementation-agnostic, List has in fact no chance of making on the base side of ghc-modulesbase cleft, the side it ought to be on. Bummer. At least this is a part of base's public interface that doesn’t need changing!

Still though, this is meant to illustrate an example of all the innocent pure code stuck in in the implementation specific layers. We can’t easily fix that, but we can at least take it’s reflection and put it somewhere better: we can put the definition in a backpack signature “above the cleft” in the GHC-agnostic libraries, and yet both instantiate (provide the implementation of) and import that signature in code below the cleft.

In this way, we can cut the Gordian not, separating:

  1. Definitions trapped in implementation-specific code
  2. Uses in definitions that are morally implementation-agnostic
  3. Implementation-specific uses of those morally-implementation-agnostic uses.

With those all separated, we nicely handle dependency chains that criss-cross the implementation-dependent-agnostic divide arbitrarily many times.

This risk here is, of course, that backpack isn’t much used, and we will probably need to fix a bunch of bugs in order to make wired-in items (that the compiler itself wants) not break with it.

Worry-free orphans

The general prohibition against orphans tends to linearize dependencies. For example, we might have a package or module chain like:

  1. A datatype, and a class
  2. Some new datatypes, and classes the new and old dependencies implement
  3. Yet more new datatypes, and classes the new and old dependencies implement

Every step of the way, the burden of writing instances gets worse as more instances have to be written, and have to be written right there. While the specific order of items can sometimes be swapped, the need for a mostly total ordering is unavoidable.

This is the inevitable looming problem which will grind Hackage to a fault, scary but a ways off like the heat death of the universe. But that’s not the reason I bring it up. Rather, it’s that base arguably is a microcosm of Hackage as a whole with respect to this disease, exhibiting the advanced symptoms now, far before the rest of the ecosystem eventually does. Concretely, many of the classes or datatypes buried deep in GHC.* modules must be there for reasons of avoiding orphans. See the comment at the top of libraries/base/GHC/Base.hs for an explanation of one prominent example of this.

I wrote Rehabilitating Orphans with Order theory · Wiki · Glasgow Haskell Compiler / GHC · GitLab about this problem, with the vague idea that since something like this ultimately probably necessary to solve the issues with base, we better get started and clean it up first.

But novel academic research is too risky for even the HF, and currently I don’t see much ongoing academic interest in pursuing things like this, with the “expression problem” strain of research dying down in recent years. This convinced me that the incremental path whose trail head @Kleidukos found with the simple low-tech split is the right approach.

If we do everything else and find the orphan problem isn’t as bad as I thought, excellent! Conversely, if it is, we will have at least brought great attention to the issue, which will hopefully inspire the researchers in our midst to take another crack at it.

Finer-grained libraries, break up base

With both of the above implemented, we not only can decouple base from GHC, but also break up the base further.
linear-base and liquid-base could become more than a second-class “alternative prelude”-type projects, as could a hypothetical proven-base for Dependent Haskell, strict-base, unboxed-base, etc.

With the fundamental change-vs-stasis question bypassed, I wouldn’t be surprised if there is still much acrimony over what direction to take base. That’s great. With base split up, we can let different directions try themselves out before making a decision about what shim-base might reexport.
(A shim-base is probably still a good idea to codify decisions by the CLC, though.)

Outcomes

Just in case this sounds all very difficult and slow, let me remark a bit on the goals and their hopeful effect.

With the ability to make more breaking changes and yet also force fewer dealing with them less at breakage time, I think Haskell will be well-positioned to undergo a renaissance of sorts. Having the state of base and Prelude up for renegotiation without the bit-rot of all that exists as collateral, I think we can not only present the Haskell we want to exist to the world to a better extent, but also involve the community better in what would be an exciting time to dust off and reinvent the old.

We can go from being a language to shows its age in its standard library (even as it hides it in the language itself with all those dank features), to having the most nimble standard library of them all, and thus the best first impression on new aspiring Haskellers.

23 Likes

This sounds like a good plan, I hope it takes off!

I’m going a bit off-topic, but the last time I read your wiki and Kilpatrick’s PhD thesis I was thinking that the world checking could usually be done in O(n) (where n is the number of instances), and only in relatively rare cases (e.g. multi-parameter type classes) would it blow up to O(n^2). It would be nice if it turned out that the extra compile time isn’t so bad after all. I wonder if that has ever really been tested.

You’re forgetting liquid-base! And I can also imagine a strict-base or unlifted-base.

2 Likes

Re: Lots of CPP.

There is precedent for this type of thing in the world of C++ standard libraries where they are obsessive about maintaining backward compatibility. They are typically littered with lots of CPP so that they can share a single implementation for various different standard versions. AFAIUI this is a successful pattern though it is obviously quite painful for the stdlib maintainers.

3 Likes

@jaror

This sounds like a good plan, I hope it takes off!

Thanks!

I’m going a bit off-topic, but the last time I read your wiki and Kilpatrick’s PhD thesis I was thinking that the world checking could usually be done in O(n) (where n is the number of instances), and only in relatively rare cases (e.g. multi-parameter type classes) would it blow up to O(n^2). It would be nice if it turned out that the extra compile time isn’t so bad after all. I wonder if that has ever really been tested.

Yeah I am having a hard time imagining where the O(n^2) is going to come from (without backpack at least), as it is very well structured which modules might provide instances that could conflict with the orphans.

You’re forgetting liquid-base ! And I can also imagine a strict-base or unlifted-base.

included!

@BardurArantsson

Indeed!

I think the larger lesson there is that it really should be possible for GHC/CLC folks to “cushion” most of the pain from breakages by supporting multiple versions. We’ve just assumed breaking changes immediately force pleasing half the community, and pissing off the other half, and that’s sub optimal.

With this, we can always make more breaking releases of base — release enough new versions and we can do 10 years of breaking cleanups in a week!! The point of contention then instead becomes when will we drop support for older versions base with newer GHC, and I think that’s a much healthier conversation.

We’re going from a depressing “stability vs quality, pick 1”, to a thrilling “stability vs quality vs ease of maintenance, pick 2” tradeoff, where there’s no amount of “having our cake and eating it too” we can’t do, provided someone puts in the effort.

1 Like

Just out of interest, what dependency does Char have on (/)?

Sorry, I was recalling split base · Wiki · Glasgow Haskell Compiler / GHC · GitLab from memory, and misremembered it. It is something more like CharInt (/) then.

Just going to nitpick on language here: the opposite of “conservative” is not “revisionist” but “progressive”. A revisionist wishes to rewrite history, usually to suit an agenda [1].

[1] Revisionist - definition of revisionist by The Free Dictionary

Well I was hoping to remark a different axis so people wouldn’t be too mad about what bucket they fell in, nor would they bring to much baggage from “real world” politics, but yes I suppose I am bending the meaning of “revisionist” — unless one really does think base really is the received wisdom of the elders we dare not corrupt :D.

The only part I disliked are the two typos here:

1 Like

Fixed :), along with a 3rd. Thanks for pointing out.

“Reformist” would be correct as the opposition to “Conservative” I’d say. :wink:

2 Likes

Is there a taskforce for this or something?

1 Like

We discussed the Jan 4 HFTT meeting.

There was debate on how much this is worth it, e.g. pain caused by GHC proper changes vs base changes, so if anyone has any strong opinions there do speak up. (I like what @cdsmith said in our defense, which is while there has been more GHC pain in recent releases, before that with things like Monoid-Semigroup and Applicative-Monad it was the other way around, and by no means can we be certain that we would not want to do similiarly big things to base in the future).

But even more than that, there was a sense in which @hecate and I need to find the time for that initial investigation to boil away enough risk/uncertainty. There is a lot of detail in the plan, but it is mostly “theory based” thinking through the problem, reading up on past attempts. Having more concrete specifics would help this move to an actual HFTP, and thus a more serious discussion with resourcing not just hypothetical.

5 Likes

Regarding instances, orphan or otherwise:

  1. Does an instance in GHC contain its module of origin?

  2. If so, can that data be used by the type checker to raise an error if it discovers multiple instances from different modules for the same class and type?

I’m very for this. I think it’s easy to underestimate the benefit of something like this.

can that data be used by the type checker to raise an error if it discovers multiple instances from different modules for the same class and type?

Isn’t that the status quo? The problem arises when the overlapping instances can’t be detected at compile time because they’re only brought together in a completely unrelated module.

The problem arises when the overlapping instances can’t be detected at compile time because they’re only brought together in a completely unrelated module.

From section 5.4 (page 87 of 329) in the Haskell 2010 Report:

All instances in scope within a module are always exported and any import brings all instances in from the imported module. Thus, an instance declaration is in scope if and only if a chain of import declarations leads to the module containing the instance declaration.

How then is it possible for a conflict between instances in an unrelated module to not be detected during compilation?

It’s really confusing to me… how can so many of us come to the conclusion that this would unblock other improvements we wish to make, yet a committee cannot arrive at the same conclusion?

Who is confused, and about what?

1 Like

Consider the collection of modules below. GHC does not detect the overlapping instance, and the output is:

C1
C2

In this case perhaps no huge harm done. If the overlapping instance was an Ord instance, and we were dealing with Data.Maps, which use Ord critically in their internal invariants, then it could be disastrous.

Why doesn’t GHC warn or error in this case? Because it only complains when an overlapping instance is used, not when it simply occurs. The instances in C1 and C2 are not overlapping (at that point).

I’m not sure why GHC behaves like this. Perhaps it’s possible to do better, although in the presence of polymorphic recursion I’d be hesitant to speculate.

module A where

data A = A
module C1 where

import A

instance Show A where show = const "C1"

example = show A
module C2 where

import A

instance Show A where show = const "C2"

example = show A
module Main where

import C1
import C2

main :: IO ()
main = do
  putStrLn C1.example
  putStrLn C2.example

I think the world semantics suggestion from Scott Kilpatrick’s PhD thesis is a very nice improvement. Basically it proposes checking for overlapping instances when importing modules. I believe an open problem is that the time complexity is O(n^2) in general (I believe for most common single parameter type classes you can do it in O(n log n)).

See also: Rehabilitating Orphans with Order theory · Wiki · Glasgow Haskell Compiler / GHC · GitLab by @Ericson2314.

3 Likes