Pre-Pre-HFTP: Decoupling base and GHC

Sure I wish more people were gung-ho about the benefits, but there’s also the costs. See for example what Richard wrote in Coordination for structured error messages by goldfirere · Pull Request #24 · haskellfoundation/tech-proposals · GitHub — it’s an open question how much the HF should directly do complicated projects itself.

I am OK with that for now. As the HF does more things, those questions will be resolved. If we need to work up to a project like this over a few months, so be it.

The O(n^2) pessimistically assumes any instance could be anywhere. But I would argue orphans in real are much more organized. Per my idea in the wiki page, we still know what instances may appear in what modules, so we sight to do much cheaper local checks!

It would help to spell out which specific improvements can be unblocked.

I mean, it’s hard to dispute that having more flexibility is better than having less flexibility. If such improvement could happen effortlessly, it would be most welcome. But this is a multiyear project, diverging valuable community resources not limited to named proposers. Shall we prioritise it? If you need a base which provides a stable interface across several GHC versions, you can switch to an alternative prelude such as relude today. Am I missing something?

Well changes like Generalise the instances of Eq, Ord, Read and Show for Compose · Issue #35 · haskell/core-libraries-committee · GitHub / Proposal: Relax instances for Functor combinators; put superclasses on <class>1 to make less-breaking · Issue #10 · haskell/core-libraries-committee · GitHub I think are really important, but it’s hard to summon the energy to work on them because realistically we at work just got to GHC 8.10 (especially due to GHCJS) it could be years before we get a compilers with a base refecting changes we get in now.

That’s profoundly demoralizing.

Conversely, if we we decoupled some modules retroactively, we could maintain new base with old GHC. That means seeing the benefits of getting something approved in maybe months.

The process to see what the breakage is is also fundamentally making a new (unofficial) version of base for an existing GHC. So having the split would make that existing process better.

We had talked about how recently GHC breakage seems more noticeable than core library breakage recently. (Though I agree with @cdsmith that could change and if anything represents us running out of energy after applicative-monad and semigroup-monoid are so much work.) But what didn’t come up is that it’s precisely being stuck on old GHCs that makes not being able to swap in a newer base so frustrating!

relude can’t help with anything relating to type classes and instances.

2 Likes

Maybe I’m misunderstanding previous discussions on the topic, but as I understand it, this is about unblocking improvements to the process of releasing changes across the ecosystem.

Eg. the coupling constrains what’s possible for that process.

2 Likes

Yes thank you. That is true, and simple and better than what I was saying.

There are many things we might like to do, and could do today, but they would take very long and cause must controversy along the way, so we won’t do them.

1 Like

Leaving aside GHCJS (which IMO is an entirely orthogonal issue), Haskell ecosystem is largely stuck with GHC 8.10, because there is no Stackage LTS for GHC 9.0. And there is no LTS, because GHC 9.0.1 was hardly usable. With a recent release of 9.0.2 and Stackage preparing for LTS 19, this difficulty is likely to be resolved pretty soon.

I strongly suspect that funnelling community efforts into faster migration of packages to newer GHCs has better ROI than embarking on decoupling base from GHC (which just delays the key issue). And if we can make GHC to evolve at a pace of a mature compiler and not an academic PoC, it would bring even higher ROI overall.

I entirely agree this decoupling base from GHC is a laudable enterprise, my doubts are just about potential ROI in comparison to other avenues we can choose to pursue (taking into account limited resources of community / HF / GHC developers).

3 Likes

Leaving aside GHCJS (which IMO is an entirely orthogonal issue),

Yes I agree GHCJS is it’s own other issue.

Haskell ecosystem is largely stuck with GHC 8.10, because there is no Stackage LTS for GHC 9.0. And there is no LTS, because GHC 9.0.1 was hardly usable. With a recent release of 9.0.2 and Stackage preparing for LTS 19, this difficulty is likely to be resolved pretty soon.

Sure, but I want base work to never be blocked not matter what GHC is up to, and GHC work to never be blocked no matter what base is up to.

I strongly suspect that funnelling community efforts into faster migration of packages to newer GHCs has better ROI than embarking on decoupling base from GHC.

Well, because of things like GHCJS, that wouldn’t really help me. That GHCJS is orthogonal is sort of my point here — it’s hard to forsee all the reasons one might be stuck on and old GHC or anything (which just delays the key issue).

I think we are just more robust if we make it so base is never blocked on GHC. Fixing the reasons why GHC would block in the first place is great, don’t get me wrong! But it leaves the underlying fragility where many things can be block many other things in place.

Your “key issue” is to me merely the key issue this time around

I entirely agree this decoupling base from GHC is a laudable enterprise

:slight_smile:

my doubts are just about potential ROI in comparison to other avenues we can choose to pursue (taking into account limited resources of community / HF / GHC developers).

The initial split in two pieces I don’t think think will be so costly, and the ROI is not what itself is, but that it allows a bunch of stuff today which bogs each other down to proceed in parallel with minimal synchronization. Without this, base is going to limp along like every other language’s stagnated standard library, because it is just too annoying to work on.

Hopefully @hecate and I will find the time to do it, and so some of the discussion here will be a moot point, but we both have many other things in progress so I am not sure when that would be. So I do think it is good to keep on discussing this stuff if only to hone the reasoning both ways.


And if we can make GHC to evolve at a pace of a mature compiler and not an academic PoC, it would bring even higher ROI overall.

Granted, I am veering into more subjective stuff here, but ultimately I do want them both to able to evolve at a fast rate, base is quite bad, and the current hodge-podge of extensions we commonly use is also bad. So any plan that relies on GHC just slowing down (because we simply don’t have the reasources yet to move fast and understand how bad the each bit of breakage is) gets me nervous.

1 Like

I guess we could only agree to disagree about our values. I believe that base is quite good; I believe that even stagnating and limping base is not a problem in an ecosystem, allowing for alternative preludes; I believe that GHC must evolve slower.

2 Likes

Fair enough, that is a good articulation of where we disagree.

12 posts were split to a new topic: The evolution of GHC

I’m not following everything here, but there may be one thing that it’s easy to agree on:

  • ‘base’ consists of around 250 modules, of which many are not closely coupled to GHC at all; e.g. the I/O library

As the OP says, maybe we could split ‘base’ into bits that are somehow closely coupled to GHC, and bits that are “pure libary code”. The latter can evolve separately without difficulty.

This is a pretty fuzzy distinction and we’d have to find a way to sharpen it up. But for at least some chunks of base, it might not be so hard.

Would that be a concrete step forward? If we agreed the strategy, a few motivated volunteers might well be able to execute on it.

1 Like

This would also allow easier experimentation around backends, having a known-size and self-contained set of impure/implementation-specific code.

2 Likes

Another option could be a graphical depiction of the module dependencies in base, using colour to highlight SCCs (else there’s the good ol’ topological sort with SCC-detection) - the non-SCC modules can then be considered first as candidates for “splitting off”.

Another bonus would be seeing just how tangled base really is, in order to determine an expedient process of splitting and how many motivated volunteers will be needed (ideally without having to find new ways to motivate them…).

I think yes, that is the initial goal of “decoupling base and GHC”, start small and keep it simple, focusing on separating GHC’s dependencies so we have a cleaner interface to work from.

What’s fuzzy about this and what could be done to sharpen the definition here?

I think it’s also worth pointing out a few resources/discussions that are related, here are a few I’ve found, there are probably more:

The fuzziness is this: what is a sharp criterion that tells you whether a type or function belongs in the “GHC-specific” part or the “pure library” part?

One possiblity is this. Consider an entity E, where “entity” means type, data constructor, class, or function.

  • If GHC has specific knowlege of E, then E is GHC-specific. In GHC-jargon, these are the “wired-in” things and the “known-key” things.
  • If a GHC-specific entity E’ depends on E, then E is also GHC-specific.
  • Everything else is pure-library

I’m not sure if that’s enough, but it’s a start

2 Likes

Yes that is exactly what I was thinking for the first step, “Cleave base in two”.

List and Maybe are perfectly innocent definitions, but as they are wired-in they will end up on the GHC-specific side. That’s not ideal, but it’s perfectly fine for the first stab at this.

2 Likes

In response to @AntC2 on another thread:

Me:

If it was just a matter of cleaving splitting tearing slicing cutting dividing chopping ripping up base into two or more pieces”:

  • It should have been attempted at least once by now;
  • It would have been easier to do back when base was smaller e.g. back in 2010.

It seems to me that the simplest option is to just move the morass of code in base to a package more ghc-centric, then starting afresh, either in an empty base or an all-new package under a different name. This approach provides the luxury of ignorance: you can just start writing new code almost immediately instead of trying to pick a path of least resistance.


Ericson2314:

@atravers yes, base reexports ghc-base, and we move over to base proper just what is convenient.

The only difference is I think @Kleidukos’s heuristic of "just non - GHC.* modules is a better first shot at doing the rip, and I emphasize moving code not coppying code.

All that, though, just reflects on the first few hours of the attempt :). After that, I think it’s exactly the same.

@Ericson2314:

base reexports ghc-base, and we move over to base proper just what is convenient.

Alternately, and if the GHC-centric modules are the minority, ghc-base is initially an empty package which base imports. GHC-centric modules are then replicated or moved, leaving base with the the implementation-independent modules.


@Ericson2314:

[…] “only modularity and the flexibility it creates” will save us.
[…] we should have amazing libraries so GHC can be easily “remixed” for all many of research prototypes.

…and right now (2022 Feb) that would require going as close to fully-parameterised (no type classes) as possible:

  • so more definitions like Data.List.nubBy :: (a -> a -> Bool) -> [a] -> [a]
  • and less like Data.List.nub :: Eq a => [a] -> [a]

…in the absence of a feasible solution to the problem of orphan instances. This is one reason why I keep suggesting starting afresh: writing fully-parameterised definitions directly seems a better option than trying to refactor overloaded definitions, which means dealing with all the class dependencies “there and then”.

But if anyone else has had experience to the contrary, by all means let us know about it…


As for those who like their definitions overloaded…as previously noted by @Ericson2314, a viable solution to the orphan-instance problem would be very helpful. Having just given Scott Kilpatrick’s approach some more thought (along with a quick skim of section 4 of his thesis), I’m now wondering if that O(n2) complexity can be improved:

  • by computing the difference between worlds instead of a whole new world being generated each time a new module is processed.
  • assuming large differences are rare, the merging mechanism should be able to more quickly determine if there’s a conflict using those differences.

It’s another example of the ol’ DRY principle in action - don’t build new worlds with duplicated information; only work with the differences (as much as possible).

We don’t want to do that because we want (the new) base to be portable across GHC versions (and other hypothetical implementations) at all times. The empty library is trivially portable, and we move over definitions only if they are also.

…and right now (2022 Feb) that would require going as close to fully-parameterised (no type classes) as possible

I am interested in these things, but I don’t think overly opinionated instances in GHC are a problem. (Outputable is opinionated but structured errors mean that pretty-printing is increasingly moved to the outskirts!)

If you want to talk to about orphan instances more, would you mind forking that to a new thread? It is orthogonal to the low hanging initial steps for both making GHC a real library and decoupling base.