Pre-Pre-HFTP: Decoupling base and GHC

Leaving aside GHCJS (which IMO is an entirely orthogonal issue), Haskell ecosystem is largely stuck with GHC 8.10, because there is no Stackage LTS for GHC 9.0. And there is no LTS, because GHC 9.0.1 was hardly usable. With a recent release of 9.0.2 and Stackage preparing for LTS 19, this difficulty is likely to be resolved pretty soon.

I strongly suspect that funnelling community efforts into faster migration of packages to newer GHCs has better ROI than embarking on decoupling base from GHC (which just delays the key issue). And if we can make GHC to evolve at a pace of a mature compiler and not an academic PoC, it would bring even higher ROI overall.

I entirely agree this decoupling base from GHC is a laudable enterprise, my doubts are just about potential ROI in comparison to other avenues we can choose to pursue (taking into account limited resources of community / HF / GHC developers).

3 Likes

Leaving aside GHCJS (which IMO is an entirely orthogonal issue),

Yes I agree GHCJS is it’s own other issue.

Haskell ecosystem is largely stuck with GHC 8.10, because there is no Stackage LTS for GHC 9.0. And there is no LTS, because GHC 9.0.1 was hardly usable. With a recent release of 9.0.2 and Stackage preparing for LTS 19, this difficulty is likely to be resolved pretty soon.

Sure, but I want base work to never be blocked not matter what GHC is up to, and GHC work to never be blocked no matter what base is up to.

I strongly suspect that funnelling community efforts into faster migration of packages to newer GHCs has better ROI than embarking on decoupling base from GHC.

Well, because of things like GHCJS, that wouldn’t really help me. That GHCJS is orthogonal is sort of my point here — it’s hard to forsee all the reasons one might be stuck on and old GHC or anything (which just delays the key issue).

I think we are just more robust if we make it so base is never blocked on GHC. Fixing the reasons why GHC would block in the first place is great, don’t get me wrong! But it leaves the underlying fragility where many things can be block many other things in place.

Your “key issue” is to me merely the key issue this time around

I entirely agree this decoupling base from GHC is a laudable enterprise

:slight_smile:

my doubts are just about potential ROI in comparison to other avenues we can choose to pursue (taking into account limited resources of community / HF / GHC developers).

The initial split in two pieces I don’t think think will be so costly, and the ROI is not what itself is, but that it allows a bunch of stuff today which bogs each other down to proceed in parallel with minimal synchronization. Without this, base is going to limp along like every other language’s stagnated standard library, because it is just too annoying to work on.

Hopefully @hecate and I will find the time to do it, and so some of the discussion here will be a moot point, but we both have many other things in progress so I am not sure when that would be. So I do think it is good to keep on discussing this stuff if only to hone the reasoning both ways.


And if we can make GHC to evolve at a pace of a mature compiler and not an academic PoC, it would bring even higher ROI overall.

Granted, I am veering into more subjective stuff here, but ultimately I do want them both to able to evolve at a fast rate, base is quite bad, and the current hodge-podge of extensions we commonly use is also bad. So any plan that relies on GHC just slowing down (because we simply don’t have the reasources yet to move fast and understand how bad the each bit of breakage is) gets me nervous.

1 Like

I guess we could only agree to disagree about our values. I believe that base is quite good; I believe that even stagnating and limping base is not a problem in an ecosystem, allowing for alternative preludes; I believe that GHC must evolve slower.

2 Likes

Fair enough, that is a good articulation of where we disagree.

12 posts were split to a new topic: The evolution of GHC

I’m not following everything here, but there may be one thing that it’s easy to agree on:

  • ‘base’ consists of around 250 modules, of which many are not closely coupled to GHC at all; e.g. the I/O library

As the OP says, maybe we could split ‘base’ into bits that are somehow closely coupled to GHC, and bits that are “pure libary code”. The latter can evolve separately without difficulty.

This is a pretty fuzzy distinction and we’d have to find a way to sharpen it up. But for at least some chunks of base, it might not be so hard.

Would that be a concrete step forward? If we agreed the strategy, a few motivated volunteers might well be able to execute on it.

1 Like

This would also allow easier experimentation around backends, having a known-size and self-contained set of impure/implementation-specific code.

2 Likes

Another option could be a graphical depiction of the module dependencies in base, using colour to highlight SCCs (else there’s the good ol’ topological sort with SCC-detection) - the non-SCC modules can then be considered first as candidates for “splitting off”.

Another bonus would be seeing just how tangled base really is, in order to determine an expedient process of splitting and how many motivated volunteers will be needed (ideally without having to find new ways to motivate them…).

I think yes, that is the initial goal of “decoupling base and GHC”, start small and keep it simple, focusing on separating GHC’s dependencies so we have a cleaner interface to work from.

What’s fuzzy about this and what could be done to sharpen the definition here?

I think it’s also worth pointing out a few resources/discussions that are related, here are a few I’ve found, there are probably more:

The fuzziness is this: what is a sharp criterion that tells you whether a type or function belongs in the “GHC-specific” part or the “pure library” part?

One possiblity is this. Consider an entity E, where “entity” means type, data constructor, class, or function.

  • If GHC has specific knowlege of E, then E is GHC-specific. In GHC-jargon, these are the “wired-in” things and the “known-key” things.
  • If a GHC-specific entity E’ depends on E, then E is also GHC-specific.
  • Everything else is pure-library

I’m not sure if that’s enough, but it’s a start

2 Likes

Yes that is exactly what I was thinking for the first step, “Cleave base in two”.

List and Maybe are perfectly innocent definitions, but as they are wired-in they will end up on the GHC-specific side. That’s not ideal, but it’s perfectly fine for the first stab at this.

2 Likes

In response to @AntC2 on another thread:

Me:

If it was just a matter of cleaving splitting tearing slicing cutting dividing chopping ripping up base into two or more pieces”:

  • It should have been attempted at least once by now;
  • It would have been easier to do back when base was smaller e.g. back in 2010.

It seems to me that the simplest option is to just move the morass of code in base to a package more ghc-centric, then starting afresh, either in an empty base or an all-new package under a different name. This approach provides the luxury of ignorance: you can just start writing new code almost immediately instead of trying to pick a path of least resistance.


Ericson2314:

@atravers yes, base reexports ghc-base, and we move over to base proper just what is convenient.

The only difference is I think @Kleidukos’s heuristic of "just non - GHC.* modules is a better first shot at doing the rip, and I emphasize moving code not coppying code.

All that, though, just reflects on the first few hours of the attempt :). After that, I think it’s exactly the same.

@Ericson2314:

base reexports ghc-base, and we move over to base proper just what is convenient.

Alternately, and if the GHC-centric modules are the minority, ghc-base is initially an empty package which base imports. GHC-centric modules are then replicated or moved, leaving base with the the implementation-independent modules.


@Ericson2314:

[…] “only modularity and the flexibility it creates” will save us.
[…] we should have amazing libraries so GHC can be easily “remixed” for all many of research prototypes.

…and right now (2022 Feb) that would require going as close to fully-parameterised (no type classes) as possible:

  • so more definitions like Data.List.nubBy :: (a -> a -> Bool) -> [a] -> [a]
  • and less like Data.List.nub :: Eq a => [a] -> [a]

…in the absence of a feasible solution to the problem of orphan instances. This is one reason why I keep suggesting starting afresh: writing fully-parameterised definitions directly seems a better option than trying to refactor overloaded definitions, which means dealing with all the class dependencies “there and then”.

But if anyone else has had experience to the contrary, by all means let us know about it…


As for those who like their definitions overloaded…as previously noted by @Ericson2314, a viable solution to the orphan-instance problem would be very helpful. Having just given Scott Kilpatrick’s approach some more thought (along with a quick skim of section 4 of his thesis), I’m now wondering if that O(n2) complexity can be improved:

  • by computing the difference between worlds instead of a whole new world being generated each time a new module is processed.
  • assuming large differences are rare, the merging mechanism should be able to more quickly determine if there’s a conflict using those differences.

It’s another example of the ol’ DRY principle in action - don’t build new worlds with duplicated information; only work with the differences (as much as possible).

We don’t want to do that because we want (the new) base to be portable across GHC versions (and other hypothetical implementations) at all times. The empty library is trivially portable, and we move over definitions only if they are also.

…and right now (2022 Feb) that would require going as close to fully-parameterised (no type classes) as possible

I am interested in these things, but I don’t think overly opinionated instances in GHC are a problem. (Outputable is opinionated but structured errors mean that pretty-printing is increasingly moved to the outskirts!)

If you want to talk to about orphan instances more, would you mind forking that to a new thread? It is orthogonal to the low hanging initial steps for both making GHC a real library and decoupling base.

Acknowledged. I’ve just searched for orphan here to see if there was an existing Discourse thread - this appeared in the results:

…which mentioned this:

which in turn refers to this:

May you be more successful.

2 Likes

Thanks! That is very useful. I am generally a fan of rebasing things no matter how ancient, so I should take a crack at rebasing @nomeata’s changes even though they are a decade old.

From it’s readme

Some changes are just work-arounds due to GHC having the package name base hardcoded

Yes we should definitely get that more flexible so we don’t need to rebuild the compiler are stuff merely moves around. Settings file, maybe?

1 Like

Proposal: Relax instances for Functor combinators; put superclasses on <class>1 to make less-breaking · Issue #10 · haskell/core-libraries-committee · GitHub I wrote up a new benefit that even the most trivial “beachhead” of making ghc-base contain everything and base reexport it all would realize.

1 Like
1 Like

…therefore:

  • https://hackage.haskell.org/package/haskell-glasgow2021

  • https://hackage.haskell.org/package/haskell-glasgow2024

  • https://hackage.haskell.org/package/haskell2028

with each being a “standardised snapshot” of:

  • https://hackage.haskell.org/package/haskell

in which all the latest products of active research debuts. The haskell package can also serve as the point of separation (or abstraction) between base and GHC, allowing the two to evolve at their own pace.

Everyone else can then choose the level of stability and compatibility most suitable for their Haskell project.

3 Likes

Standard library reform by Ericson2314 · Pull Request #47 · haskellfoundation/tech-proposals · GitHub I have opened a draft tech proposal on this.

7 Likes