The evolution of: Decoupling base and GHC

n here is the number of instances declared for this class. If instance j is more specific than instance i, Hugs inserts j in the tree in front of i and it’s done/no need to search the rest of the tree. If j is apart from every other instance (no overlap), it goes as a leaf of the tree, that needs (j - 1) comparisons to get there. (That’s a fairly crude algorithm, but simple to program.) So it’s only the very last instance that needs (n - 1) comparisons.

It could be made more efficient, if that really becomes an issue. (I guess it might for large classes like Eq.) Organise the tree as a BST-lattice hybrid sorted by the constructor of the first (probably only) type param. OVERLAPPING instances point to their most-specific OVERLAPPABLE. (That’s overkill for Eq, because I don’t think anybody wants overlapping instances in there. Then declare in the class whether overlapping allowed.)

1 Like

Thanks, but this is exactly what nobody’s explaining. Let me try to spell it out. Tell me what I’m getting wrong.

“Well-typed programs can’t go wrong.” So if a new base has the same functions/methods with the same type, why “need to update other code”?

  • If base takes away a function/method, yes code will break.
  • If base merely moves/restructures existing functions/methods, who’ll notice? (But equally why should base bother if nobody’ll notice?)
  • If base adds a new class/function/method/datatype, yes that could clash with something of the same name in client code. (But I don’t see adding anything proposed in the threads.)
  • If base adds new instances for existing classes, yes that’ll clash with client code that had rolled their own. (This was one of the complaints with the ‘FTP changes’: not only were there new instances, they also had surprising semantics. In some cases I for one wanted those instances not declared at all, because no sensible semantics was possible.)
  • If base changes the signature of existing methods (for example making a free-standing function on Lists into a method of Foldable), this’ll lead to subtle breakages. This was the bulk of the complaints with the ‘FTP changes’. In theory List is an instance of Foldable so the change should have been invisible. In practice people had to add Foldable constraints all over their signatures.
  • If base imports some module/class/instances that were previously outside base, previously needing explicit import in client code to access them, the new base just piggy-backs on the original import. Or is the problem that client code wants to continue to use an older version of the import? Why, if its signature hasn’t changed?

Or tell me straight: “Well-typed programs can’t go wrong” is bunkum.

You have listed at least four things that can lead to breakages and cause somebody to need to update their code. So I believe you have answered your own question – because changes to base do cause the api surface (as reflected in exported functions, datatypes, and type signatures) to change.

1 Like

As to why do we want to change base in the first place:

  1. Obscure things in GHC.* modules force a PVP breaking version number bump even if none of the main commonly-used stuff changed.

And also many different opinions:

  1. Too much Int where negative numbers make no sense (e.g. lengths of things)
  2. Classes like Num with no clear laws.
  3. Partiality, especially in methods like *1 in Foldable and with Enum
  4. Anything steering towards unsafeInterleaveIO
  5. String stuff is too accessible
  6. Stuff throwing synchronous exceptions is too accessible

These are opinions — you need not agree with them. But at the very least, even if we only add new features to the “main” parts of base and do nothing breaking, we still have the problems @sclv mentioned because GHC might have breaking changes, and people ought to be able to get new non-breaking base changes without GHC changes.

2 Likes

OK so counting from after the ‘FTP changes’ brouhaha, which of those breaking changes has base perpetrated? And how would a reorganised base have avoided the breakages, or at least minimised the impact?

@Ericson2314 seems to be throwing in the kitchen sink. To pick one point as an example:

  1. Classes like Num with no clear laws.

Has base made breaking changes to Num? Indeed has base made any visible changes to Num? I rather thought Num is the same now as Haskell 98. Then don’t mix up complaints about breaking changes with complaints about (what turned out to be with a great deal of hindsight) poor design. In early 1990’s when the Prelude was developed, who’d have expected typeclasses would be so wildly powerful, and would have connected to Category Theory?

As I mentioned, Num (and Integral, Fractional) are pretty much baked into the definition and syntax of Haskell. Nothing’s stopping you creating a whole bunch of other Numerical classes and operators with all the nice properties.

  1. Partiality, especially in methods like *1 in Foldable and with Enum

Has base changed methods/functions to be more partial? Again don’t mix up complaints about breakages with complaints about a design you don’t like. Again there’s nothing stopping you creating safeHead, etc. (I’m not going to defend design choices in Foldable; but any more changes to it had better have an enormous benefit/cost ratio.)

A usage of head or (!!) is not necessarily unsafe: it may be surrounded by checks to avoid calling it unsafely. Yes it’s unfortunate those aren’t type-safe checks, in a language which vaunts the benefits of type safety. “Well-typed programs can’t go wrong” is bunkum.

I think a better approach is through education: stop teaching newbies so much about Lists (including String) and so little about appropriate datatype design/including especially other off-the-shelf recursive data structures.

So is it this (below) what you want? And is this the opinion of you all:

  • Reorganise base so a program can (for example) exclude Num and all its Pomps; then
  • instead import theoretically-pure ShinyNum using all the same class and operator ids; and
  • otherwise use standard Prelude.

How about GHC wired-in modules for implementation stuff like arithmetic on pointers and indexes? Is that also to use ShinyNum? How about modules like Vector with Int indexes; or Data.Set with a size :: Int embedded in every node and (Num.+) to calculate it? Checking for numeric over/underflow or index-out-of-array comes at computational cost. Those modules are already making limited checks, enough to avoid IllMemRefs. ShinyNum will duplicate work for programs that already don’t throw those exceptions. Who’s then responsible for addressing the performance degradation?

@antc2 you have misunderstand my post in the way I feared would happen.

Right now, all these changes to base we can’t even seriously debate because there would be far to much breakage simply because users are stuck with the version of base GHC ships with. With the decoupling, we at least expand the overton window on things the CLC can consider, and I think that is very good ----- even if none of those no longer beyond the pale changes end up being accepted.

Also, see what @atravers linked in Pre-Pre-HFTP: Decoupling base and GHC - #43 by atravers ; there is better motivation there than what I wrote.

Speaking of String, here’s a very good example. I actually have little problem with [Char] being easily obtained from literals — for if beginners need to eliminate strings, this is gentlest way there is. The real problem with String is not that it exists, or even that interfaces that use it exist, but that type classes do ridiculous shit to support it.

For example, the Read class is a disaster:

class Read a where
  readsPrec :: Int   -> ReadS a
  readList :: ReadS [a]
  readPrec :: ReadPrec a
  readListPrec :: ReadPrec [a]

the bottom two are proposed to be replaced…we should have a deprecated cycle to move them outside the class then, just like return = pure should become mandatory. So:

class Read a where
  readPrec :: ReadPrec a
  readListPrec :: ReadPrec [a]

But then readListPrec is a hack for String. If we had

newtype String = String [Char]

that could have the string literate instance, and then we can delete readListPrec

readListPrec and friends are, in my view, overloaded instances laundered as extra methods. Not good!

1 Like

D’oh please stop using apocalyptic language. You’re just undermining your case. (Are there more than two type classes with specific provision for String?) Read has been doing its job happily for 30 years. It’s not broken/don’t fix it/don’t use it as a pretext for breaking all sorts of other stuff.

Show uses exactly the same ruse for [Char] vs arbitrary [a]. So you’ll have to change that at the same time. But there might be good reasons I want to show a [MyType] in a special format and be able to read back in that format. Yes, it’s a cheat to avoid overlapping instances. But since overlapping instances still aren’t blessed, it’s the lesser of two evils.

(The class gives default definitions for the two methods you want to move out, so I guess moving them will be mostly invisible. Unless somebody’s giving some custom overloading somewhere. I’m still not seeing any motivation for tinkering.)

There’s a thread discussing the threshold to clear for a breaking change. That seems to have fizzled out without conclusion. Then isn’t the Read proposal going to go the same way as the moving out (/=) proposal? (Unusually for me – some would say – I didn’t comment/just couldn’t even … I do regret the community uses up so many cycles on such small issues.)

So let’s all now get out of @Ericson2314’s way and see what he can do with @nomeata’s previous work - unless I’ve missed something (again!), we’re not being asked to join in. @Ericson2314 may be able to apply fresh insights to the problem, which @nomeata wasn’t aware of in 2013.

Thanks for the explanation, I think I get it now. The instance tree only is forced to only branch near the root, so most of the tree are just linear lists.

I do think that is rather limiting. For example you wouldn’t be able to write the popular data types a la carte subtyping class:

data (f :+: g) e = Inl (f e) | Inr (g e)

class (Functor sub, Functor sup) => (:<:) sub sup where
  inj :: sub a -> sup a

instance Functor f => (:<:) f f where
  inj = id

instance (Functor f, Functor g) => (:<:) f (f :+: g) where
  inj = Inl

instance (Functor f, Functor g, Functor h, (:<:) f h) => (:<:) f (g :+: h) where
  inj = Inr . inj

But that should probably be a closed type class anyway, so perhaps there are other ways to work around that limitation. I would love to see an analysis of what instance trees look like in Hackage packages.

Sure you could. It needs trickery to make it appear instances are in a strict substitution ordering. So you end up with difficult-to-follow code (untested):

instance {-# OVERLAPPING #-} Functor f => (:<:) f f where
  inj = id

instance {-# OVERLAPPABLE #-} (Functor f, Functor g', SubT' f g') => (:<:) f g' where
  inj = inj'

class (Functor sub, Functor sup) => SubT' sub sup  where
  inj' :: sub a -> sup a

instance {-# OVERLAPPING #-} (Functor f, Functor g) => SubT' f (f :+: g) where
  inj = Inl

instance {-# OVERLAPPABLE #-} (Functor f, Functor g, Functor h, (:<:) f h) => SubT' f (g :+: h) where
  inj' = Inr . inj

(Needs UndecidableInstances.)

One of the limitations with Swiestra’s approach is that the tree is right-biased. (The instances don’t try to recurse to the left of (:+:) like (f :+: f') :+: (g :+: g'). [2008 JFP paper, middle of 2nd page of Section 4 “may fail to find an injection”.]) This technique can be expanded to cope with that. (You need two auxiliary SubT classes; and you need to choose which direction to prefer first. The instances get pretty ugly.)

You mean that feature that we do not have? And do not need?

Yes. In particular are there any ‘partially overlapping’ instances?:

instance C a Bool  ...
instance C Int b  ...

Allowing that really hampers GHC sharpening up its act on overlaps. And I suspect nobody actually wants it. (Or if they do, the above technique can handle it, at cost of forcing a choice.)

1 Like

(It was your comment that drew me in.) @Ericson2314 doesn’t have to use up his time responding to me. His investigations are going to come back with the costs side, in terms of effort to reorganise base and breakages liable in client code.

The benefits side will still be unevaluated. So all this discussion will happen again. (Which seems to be the pattern with CLC.)

I thought that as I’m disinterested and largely uninterested, I could prod people to articulate the why – which would both build the evaluation for benefits and motivate where to put the split(s) in base.

I admit I’m in favor of the split, but I don’t understand how juicy, interesting, breaking changes to base are a benefit that the split would provide for us. Split or no split, everybody uses base, so breaking changes are just as hard to swallow either way.

Here, on the other hand, are concrete benefits to the split:

  1. Decreased cognitive load.
    1. Project maintenance gets more tractable. Building, releasing, packaging, etc.
    2. The blast radius for changes of any sort is decreased — both damage done by breaking changes, as well as reasoning about the breakiness of changes in the first place.
  2. Improve backward compatibility. base currently includes a bunch of GHC internals. Since they change with each version of GHC, they trigger a breaking change for all of base. If those changes were constrained to a different library altogether, GHC could stay backward compatible with older bases.
5 Likes

Can those out-of-place “GHC internal” definitions in base be replaced using e.g. Haskell 2010 FFI declarations and some extra C modules?

Can they? I thought they were mostly Haskell code in the first place. The GHC.* module hierarchy in base.

1 Like

Yeah some of the more tricky stuff is when the RTS calls a Haskell function in base! We would arguably need cyclic backpack to really do this sort of thing right.

…yeah, why bother with abstraction when there’s only one mainstream implementation of Haskell?

As for potential solutions:

  • if it’s a RTS primitive which is calling a base entity, there may be two options:

    • if the primitive is rarely used, you could try passing in the base entity as an extra argument;
    • if it’s widely used, another option is to replace the RTS primitive with a new base declaration via the FFI - it may still be easier to add that extra argument, and ignore the complaints…
  • if, for some reason, it’s some part of the fundamental workings of the RTS that make the call to base…let’s just say “tricky” is being diplomatic:

    • If the offending part can be easily isolated, you may be able to replace it as well using the FFI;
    • Otherwise, being lazy is probably the simplest option - log an issue report with the tracker for GHC, and make a note of the miscreant in your progress report.

…progress report? Yes: for a good example of their use, see Philip Heron’s series of reports about Rust-GCC - most other requests for updates can then be silently deleted (ideally automatically!). You can then more easily focus on this task, which will require plenty of your attention.

Those progress reports will be invaluable e.g. if this topic is still being debated in, let’s say, 2027 people can just be directed to read them to find out what happened…

@AntC2 you are most welcome to elaborate on these statements. You seem to be under an impression that CLC has an articulated opinion on the matter, but I have no clue what makes you think so.

3 Likes

I said: I thought that …, I could prod people to articulate the why

Because the impression I’m under is that the CLC does not have an articulated opinion on the matter, so is failing to give guidance where leadership is clearly needed. I have no clue what makes you think I’m seeing articulated opinions.

Those quotes from me you snipped are pretty close to links and observations. But in case you missed it …

And I see @Ericson2314 got there before me: “we get these massive threads”. And that was barely a third of the way through that thread. (Addit: And after plenty of discussion on the mailing list.) At the half way point the Committee decided/closed the issue. Then there were requests to re-open and more rounds of discussion – which came to no conclusion I can see.

So somebody thought it a good idea to have Principles that would guide future discussions. (yes please!) A thread of 63 comments so far. No conclusion I can see.

Guiding Principles are the sort of thing a Committee should be there for. The CLC’s Charter says there will be Controversial Decisions. (d’uh, well yes) And then … ?

Specifically where I came in on Decoupling base from GHC wired-in, there’s 42 replies on the original thread (which went off track) and 108 on the newly-spawned attempt to corral it. I see no criteria for how the Committee might evaluate any proposal. I’d expect the criteria to derive from the general Principles guiding assessing impacts of breaking changes vs expected benefits.

I get it that Discourse isn’t under the aegis of CLC; and that the bulk of comments on those GitHub Issues are not from Committee members. If the CLC had some guiding Principles, they could shape the discussion to tone down apocalyptic language and sweeping unquantifiable claims (like @chreekat’s “blast radius for changes of any sort is decreased” – of any sort?).

There’s been mutterings about the design of base all the time I’ve been following Haskell. I’ve seen (versions of) this wishlist plenty of times. Nearly every attempt at it has faltered – with the conspicuous exception of the bruising ‘FTP changes’.

@Ericson2314 is about to do what? Split out the GHC wired-in parts and their dependencies? Revive nomeata’s work from 2013? “Rip base in two” along its natural fracture points?

I get the impression @Ericson2314 is not doing (whatever it is) for his own entertainment. It sounds like a lot of work. Whatever results, is the Committee just going refuse point blank – because it’s too controversial or causes too much breakage? If there is to be a benefits vs cost assessment, what are the metrics for benefits? As opposed to chreekat’s ineffable claims.

Your interventions in these threads have generally been at the ultra-conservative (small-c) end:

It’s difficult to imagine slower than the pace of failing to make a decision about removing (/=) from Eq. For the sake of managing potential contributors’ expectations, what does “slower” mean? And is (whatever that means) the collective view of the Committee?

1 Like

I am rather bemused by the whole idea a set of sourcecode (the libraries) could be insulated from the very compiler they rely on. If GHC (the compiler) changes – even with a tiny bugfix release – all the object code for those libraries must get regenerated.

Of course GHC developers are meticulous to avoid side-effects of changes; and to document everything that’s changed. Nevertheless, prudent software release practice is to trust nothing and re-run the whole test suite against the re-compiled application and libraries.

Reorganising base to minimise dependencies won’t remove the chief dependency: on the compiler. So how could changes to libraries be more conservative (small-c) than changes to the compiler?

Upgrading a compilation environment is different to upgrading an application environment, in which the asset to protect is its database content and continuing business operations that query and maintain it.

I don’t agree. Changes in the compiler should only affect a tiny fraction of the language and I think that has historically been the case (at least recently), so it isn’t necessary to consider adjacent compiler versions to be completely different. Of course you should still re-run test-suites and otherwise check that the changes don’t break things, but I don’t see how that makes it impossible to split base.

4 Likes