Yes, I’d like to lend my support to @bodigrim’s post, and add that I don’t think we will ever come up with a satisfactory formal definition of “stability” or “internal”. I wonder if that’s why it’s proving to be such a sticky issue in the Haskell world. We have a lot of world experts in expressing complex things formally, and we have a tendency to apply the same techniques to things where it doesn’t work so well.
What does this mean? Are you arguing for an absolute standard because a relative one sometimes gets it wrong? I am not sure what is being advocated here, though I do recognize the sentiment.
Over in OCaml-land, there is a compiler flag -principal
which says whether or not the compiler should be principled in its type inference. (The flag name is about whether or not the compiler infers “principal types” – a concept from type theory – but the way most users experience this is that -principal
type inference is more predictable, and hence principled.)
Maybe we can have something similar in GHC?
That is, the example of TypeFamilies
is a good one. There are aspects of that feature of GHC which are more reliable than others. For example, if I have type instance F Int = Bool
, and I see F Int
in my code, I can be confident it means the same as Bool
. But if I have G1 [x] = Double
and G2 x = [G2 x]
and I wish to work with G1 (G2 Bool)
, then it’s much subtler to determine that G1 (G2 Bool)
should reduce to Double
. So part of the feature is nice, and part is not so nice.
I think we GHC developers might be able to differentiate these cases – at least on a best-effort basis. (OCaml has been more principled with their application of -principal
than I think GHC can realistically reach.) That is, there are many places in GHC where we try the “obvious” thing, it doesn’t work, and so we do something clever that happens to accept a handful examples we have in front of us. We like to convince ourselves that we’re coming up with a general solution that will address all likely examples that will come up in practice, but experience suggests we’re actually just rationalizing accepting the few examples we’ve been creative enough to write down. Most of the time, we know when we’re playing this game – it’s usually evidenced by so-called Wrinkles (example) in the code base. So maybe we have a warning (off by default!) that bleats whenever we know that GHC is going outside its usual envelope?
This is something that would grow over time and would always be a partial solution, so maybe it’s just not fit for purpose. But my gut feeling is that, if we did this well, most of the type-checker changes that occur release-to-release would affect only code that uses these wrinkles. Unfortunately, the warnings produced would generally be unhelpful – something along the lines of GHC had to think hard to accept your program; your mileage -- and stability -- may vary
. Doing better would be very hard, in general, and trying to do better would likely demotivate the GHC authors from adding new cases for the warning. But maybe the opaque warning (maybe with some lines highlighted) is enough?
Likewise we have historically considered some modules in
base
'sGHC.*
namespace to be internal to GHC (something we are discussing addressing at the moment).
I really feel like all the drama here is a matter of unclear social signalling. I am very thrilled to see more people get excited about a ghc-base
vs base
to help with library signalling [and the ball is squarely in my court to go edit that proposal!].
Likewise, things like -X....
and -fglasgow-exts
make perfect sense in theory, but that fact the stable thing they refer to (the Haskell report) does not have a position in our community like stable Rust (to compare Rust unstable features) or standard C++ (to compare -std=gnu++17
) means that people have all sorts of diverging interpretations.
Either we revive the Haskell Report (C/C++ route) or we make a GHC-specific notion of stability (like what @rae and others earlier in this thread say) and then…problem solved!
All this “restore the social contract”, which means we can fix bugs without acrimony, which is good.
Instead of attempting to define what the “stability” of an language extension (or feature) should be, perhaps a simpler heuristic could be based on “conflict counts” - the number of other extensions it conflicts with. An extension which conflicts with many others would then have an adverse “ranking”, alerting potential users to the possibility of future difficulties.
The problem is that “conflict count” isn’t something that we already know but haven’t yet written down. In many cases we don’t know that a “conflict” exists without hard thought and a bit of luck. There are countless ways in which extensions might interact and many of these live in the darker corners of the language. Of these potential interactions only some are problematic; seeing which those are can be a tricky exercise.
Having been the receiving end of breaking changes in the past, I am quite sympathetic to @Bodigrim’s point. However, I have a hard time seeing how to turn it into something actionable. Perhaps we need to break less as a community. However, without knowing which past changes we would have decided differently and why it is hard to know how to get there.
So again, I think we need to make this discussion concrete: reflect on which past changes which we feel were unduly costly and what lessons we can distill from those for the future.
One potential problem with such a list is having to decipher e.g. the vague complaints about others not doing enough, when people point out what is being done then further complaints about how the same people struggling to find time to do things also don’t have time to resolve them as nicely as the complainant would like, etc, etc, etc.
So let’s instead look at an analogous situation (well, it looks analogous to me) - how such an instability is managed in GHC itself:
…with all of that being attributed to a single cause:
So what was the chosen solution?
If only LLVM had some level of “stability” or “backwards compatibility”; then GHC could (for a time) just use whatever version provided at the installation site (or from the LLVM site), freeing GHC devs from tending their own version/s of LLVM:
stability - needing one or more private, customised versions of GHC or LLVM should be optional, not mandatory.
-
if there are technical problems to making this work: this is the Haskell Discourse; solving technical problems (even partially) ought to be one of our specialities!
-
if there are social problems to making this work: that’s what the Haskell Foundation and its various (sub)committees, working groups, et al presumably are here for.
Will we all like the solutions? No: we’re a community, not a “hive-mind” collective. We are all individuals so a few will be unhappy, in the same way some are still unhappy about using ::
instead of :
for type signatures.
Will those solutions always work? No: the aforementioned “corner cases” will continue to appear, so the solutions of the day will need to be reviewed with the intention of improvement (or replacement). But this being the Haskell Discourse, enough of us should also know about the concept of “diminishing returns” and Rice’s theorem - we can’t have everything…
I believe you being up the LLVM backend on purpose? There is a stable IR. It’s the binary bitcode IR. I’ve written a backend for it (you will likely be able to find my HiW talk on it somewhere from years ago). The textual IR has gotten a lot more stable since then though, and the bitcode backend never made it into GHC. Toolchains suck, I’ll give you that. LLVM isn’t a great fit from where the llvm codegen takes off. You’d ideally want this somewhere between STG and Cmm, Cmm is already too low for LLVM IR. There are a lot of other reasons why LLVM is not an excellent target for GHC. There is lots of room for improvement.
Writing the AArch64 NCG was strongly motivated by the shortcomings of the LLVM pipeline.
Untethering GHC from system dependencies is also strongly motivated by resilience and stability. The whole reasons for the Ar
module in GHC is so we don’t have to deal with gnu or bsd ar
on different systems.
Same reason spending a lot of time on the internal linker. Having to deal with system linkers and their defects/special needs across various platforms is and can be a major headache. It’s great when they work, but what if they don’t? Having a fallback option is valuable wrt to stability.
If LLVM has backward compatibility issues, why not target another simpler backend like C? This would greatly reduce complexity of the compiler. Performance of compiled code could probably be on par with LLVM. This new language has a C backend, which seems to work fine.
No - I was attempting to show how “dedicated versions” are being used to counteract instability in dependencies:
-
users of GHC frequently have multiple versions of the compiler to lower “the language-upgrade frequency”;
-
GHC itself has it’s own version of LLVM to lower the “library-upgrade frequency” (according to that GHC wiki page).
…I’ve just looked at the current (17.0.0) LLVM developer policy - it only says:
…no mention of future versions, even in a reasonably-limited sense. Furthermore:
As for the prevailing stability of the textual format:
…so presumably GHC will continue having its own dedicated version of LLVM, so that the limited resources of GHC developers are spent more on improving the back-section for LLVM, and less on chasing after LLVM.
(…assuming the content of that wiki page is still reasonably accurate.)
…I had suspected as much.
…much like having maintaining dedicated versions of LLVM for GHC, or specific versions of GHC for codebases…to help to “isolate the irritation” by lowering the frequency of breakage.
This may help to explain why the C back-section for GHC is now so “rudimentary”:
For the avoidance of doubt, let me prefix this with: this is my opinion and not that of any current, past or future employer.
As much as I enjoy discussing (and fixing) toolchain issues, this was originally about stability and code being consistency broken by GHC without deprecation cycles, and my fear that this is happening again with 9.6.
Let’s assume we have a hypothetical codebase of ~5k Haskell files with ~500k+ lines of code. Using a non-trivial number of dependencies from hackage. Furthermore let’s assume this code works perfectly fine with GHC 8.10.7, and has a battery of unit, property and integration tests, as well as manual testing to go through. We have to assume that the compiler is not perfect and not bug-free, that’s just an honest take on reality. There will be bugs, we do out best to mitigate them. One of those strategies is to use strongly typed languages, and hope we already rule out a large class of potential pitfalls. However we can not assume the compiler to be perfect and bug free. It is what it is, as pretty much everything it’s best effort, and everyone does their best to produce as good results as we can.
If we try to compile this code with 9.2 it’s virtually impossible. The probability that somewhere in that code (and quite likely somewhere in the dependency tree) something is rejected by the compiler is fairly large. Sure, it might only be a handful of packages that are impacted, but that impacts all their direct and indirect consumers.
“But it’s an easy fix” you say. Fair point, it’s an easy fix to patch up that package. Maybe we are lucky and all it’s consumers don’t need to be adjusted, but maybe they are, and maybe it’s just a version bound that needs adjusting, no biggie, right? Well, this just ripples all the way through.
Ohh, and we used a dependency of which there is already a new major version. So most likely the old version we were depending on (could even be one of our own), won’t be updated, and we are urged to also upgrade that dependency. But this may now require changes to consumers of those libraries, because some interfaces changed. And so on and so forth. Once we end up having everything compatible with 9.2, we can basically throw away everything we knew about the existing codebase. So much has changed that can have impacts all across the codebase, it’s impossible to tell. Maybe there is some newly introduced unsoundness no one knows about yet? Maybe someone used a feature that has a bug?
Again we rely on our unit, property and integration tests, as well as manual testing to verify the software we build adheres to some quality gates we defined.
Ohh, and we need to make sure everything still works with 8.10, in case 9.2 (despite potentially fixing unsoundness or other bugs), doesn’t regress for us in any way. And what’s the value we got out of the migration? From the business perspective, technically none. We still build the same application that works (hopefully) the same as it did before the migration. The only reason we migrate is so that we do not incur too much of a technical debt, and are compatible with the ecosystem as it evolves. (And well because we actually contributed something to GHC, we’d like to use: a codegen that significantly reduces compile times). We already see some cross compilation pipelines fail. Wonderful, let’s see what regressed there. Or maybe it’s stuff we had patches against 8.10 for — but that’s a dead end, so upstreaming to 8.10 is pretty pointless anyway; and well, patching 9.2 (with 9.6 pretty much being the current release, and master even further) again, means no upstream contributions without significant work on adapting patches to 9.6, and master.
“But wait”, you say, “why didn’t you check for regressions earlier, using the alphas? You could have told us if something regressed!”. Well could we? How? The compiler outright rejected our code.
And we are back in the same situation, we contribute to GHC, but won’t be able to use it until we manage to move the our codebase to 9.6. Again, the value tradeoff is questionable.
So why don’t we just fork GHC? And implement what we need in our fork, and ignore upstream altogether? I guess that’s tempting. Except that I think this would be bad for the ecosystem, and could become a liability.
All this to say: continuously breaking the compiler causes massive costs to users when upgrading for no obvious value other than “staying current”, cause lots of code churn, and thus changes to dependencies, and therefore increases the risk of some new (and different) unexpected behaviour. It also means, GHC developers will not get feedback (in the form of regression reports) from users (as their new compiler rejects existing code). And what probably makes me sad the most is that it means patches will just bitrot, and contributions to GHC are significantly inhibited. We can not work on recent compilers (forget about master), hoping to address issues in our codebase; it’s already rejected by the compiler.
And lastly let me say this again (as I said before). head.hackage is not a solution. It’s a symptom. All that head.hackage allows is to make hackage compatible with a (newly) broken compiler. This is crazy.
As the question about “what” breaks came up. Let’s assume the following Broken.hs
module:
-- compile with ghc -package bifunctors -package mtl Broken.hs
{-# LANGUAGE Haskell2010 #-} -- default-language: Haskell2010
{-# language BangPatterns #-}
{-# language TemplateHaskell #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE TypeApplications #-}
module Broken where
-- A few imports
import qualified Data.Bifunctor.TH as TH
import Data.Bifunctor (bimap)
import Control.Monad.Except
-- Define some data types
data X a = X { s :: ! a }
data Y a b = Y deriving Show
data D = D
data T = T
-- some classes
class C m a where f :: m a
-- and an instance
instance MonadError D m => C m T where
f = pure T
-- some functions
g = bimap id id Y
h = read @ Int "5"
-- Derive some instances
TH.deriveBifunctor ''Y
This code compiles error free with 8.10.7. I invite you to guess how many distinct rejections this file will cause when tried to be compiled with 9.2.
@atravers do note that that wiki page is indeed rather out-of-date at this point; the compatibility story of LLVM’s textual syntax is much better now and we can now support a range of LLVM versions.
Yes, but GHC does offer some level of backwards compatibility. In many cases we go to great lengths in this community to offer that stability. The problem here is not that we offer no stability guarantees; it’s rather that there’s disagreement about where in the continuum of trade-offs between “break everything all the time” and “never break anything ever” we should sit.
This is a very helpful example.
So there are indeed quite a few breakages here:
-
FlexibleContexts
is now required for theC m T
instance (due to UndecidableInstances also allows flexible contexts (#19187) · Issues · Glasgow Haskell Compiler / GHC · GitLab, fixed in 9.2). - The
deriveBifunctor
splice now must appear beforeg
in order to satisfy the fix to #16980 - The spaces after
@
and!
must be removed due to Proposal #229.
All three of these are great examples of breakage which could have been much less impactful. Thank you, @angerman.
In the case of (1) I agree that this should have had a deprecation cycle as original issue was bothering no one, was not contracting GHC’s documented behavior, was otherwise not urgent, and could easily be warned about. Happily, This is the sort of thing that I hope we will be more likely to catch now that we have head.hackage
routinely testing user-impact; while the head.hackage
patch for this change wasn’t massive, I would hope it is large enough to raise some eyebrows and prompt us to reflect on alternatives. Concretely, I think the usual three-release dance would have helped here: 9.2 would have started throwing a -Wcompat
warning, which would be enabled by default in 9.4. The breaking change would occur in 9.8. Naturally, while this wouldn’t have avoided the migration work entirely, it would have provided more run-way for end users. Would this have helped, @angerman?
(2) is a bit trickier as it fixes a bug that could adversely affect some users. Moreover, it is fixing an infelicity relative to documented behavior: that top-level splices delimit declaration groups. Naturally, this is hardly consolation for the many users affected by the change. I’ll admit, I’m not familiar enough with the code in question to know what else could have been done here. Could we have issued a warning when information “leaks” from one group into a preceding group? Perhaps, but this does sound tricky to do. Could we have instead adjusted the specification to instead allow such programs to typecheck? Perhaps, but this sounds like a fast road to confusing errors at best and compiler non-termination at worst. In short, I’m really not sure what to do here. This is by far the hardest case of the three to extract generalizable lessons from.
(3) I think points to an area where the GHC Proposals Process could be improved. The proposal recognizes that the change may break existing programs and says that the programs should adjust their whitespace to match the new whitespace rules. However, the proposal is silent on breakage extent, migration timeline, pre-breakage user warnings, or how tools might help aid migration. The first item is understandable as the proposal pre-dates head.hackage
being an easily-used tool for characterising this sort of impact (although Ryan did adapt head.hackage
). However, today we could decide that the proposal process should do better in all of these areas:
- we could require an approximate impact assessment for syntactic changes (e.g. how many occurrences of the affected token sequences does
grep
find on Hackage?) - alternatively, if grepping for affected syntax is too fiddly, the committee could instead require a second phase of approval such that, when an implementation is ready, a proper
head.hackage
study can be performed. In light of the results of this study the committee and proposal could design an appropriate migration strategy before final approval. - the committee could adopt a strategy similar to that of the CLC, requiring a warning period for any change that breaks user programs
- with the continued progress of
ghc-exactprint
, I am hopeful that we are now in a better position to provide automated refactoring tools to facilitate mechanical migration for this sort of change. This particular change would be a great test-bed for this sort of tooling.
To summarize, of the three breakages above, I think two could have had a warning period fairly easily. In the case of (3) we might have decided to avoid the breakage entirely by either automated refactoring.
I do hope others have some more examples of breakage; the above was a very useful exercise.
This is what I’ve been getting at. I am not fundamentally against breakage. I am fundamentally and absolutely against breakage without lead time and warning.
As I have outlined in the previous post, the inability to use a recent compiler with a codebase that is accepted error-free with a compiler just two versions prior is—to me—unacceptable. It prohibits
- testing that compiler against large codebases.
- reporting regressions, bugs, … during alpha, beta, … releases, leading to bug reports late in the release cycles.
- contributing to the compiler in any meaningful way
- reaping the benefits of contributions to the compiler
- causes massive amounts of churn throughout the whole ecosystem.
- puts a massive burden on maintainers
- invalidates all assumptions about the code and it’s dependencies.
I wouldn’t mind testing nightly releases against large codebases and reporting bugs, regressions, …
The current assumption is: don’t bother. There is no point. It won’t accept the existing code anyway and the ecosystem likely won’t be ready until we are at least at .4
.
If you meant 9.8 here, I agree. I’d you meant 9.6, I don’t. It should be warning by default for at least two versions (1yr). If I can opt in to get the warning 3 versions (18mo) that would be good as well.
I think your feelings about breaking changes are extremely valid. Having to do work like this can be very draining and thankless.
Yet, I think we should be careful not to fall into despair. The more people who do this upgrading work, the less work it becomes (at least on the ecosystem end).
My work codebase is probably not as large or as complex as yours, so, I was able to test it against GHC-9.6.1-alpha1 last week using head.hackage and in doing so I spotted an unintended breaking change that has now been swiftly fixed Changes to OverloadedLabels cause (#.) to fail to parse as identifier in GHC-9.6.1-alpha (#22821) · Issues · Glasgow Haskell Compiler / GHC · GitLab . So, this idea of testing production codebases against alphas can work. It does require work though, and the larger your codebase the more work it requires. So, it’s reasonable that at your scale it is out of reach at the moment.
I think as well the upgrade from 8.10 is particularly painful but the upgrade from 9.0 to 9.2, in my experience at least, has been pretty painless.
I too thought that Moritz’s concrete example was very helpful. Like Ben, I think we are more likely to make progress if we focus on specifics than on generalities.
The generaliities are things that (I think) we all agree with
- No one wants breakage
- Everyone wats bugs to be fixed
- No one wants Haskell to be frozen so that innovation ceases – or becomes terribly costly.
But these general goals are in tension. Fixing bugs can break code that (however inadvertently) relied on the bug. It is really hard to innovate with zero breakage. Moreover, responsiblity is diffused; breakage may happen because language spec changed (GHC Steering Committee), the implementatoion fixed a bug (GHC team), the API of base
changed (Core Libraries Committee), or the API of some widely used library change (maintainers of that library).
It’s hard to elicit general lessons from this complex interaction of generalities. I think we are better off considering specific cases, and seeing if we can distil specific lessons – as Ben has done above from Moritz’s example. Let’s do more of that.
Our goals are really well aligned. Our experiences are very different. I think we can work together to make things better. Moritz’s point is helpful here:
This is what I’ve been getting at. I am not fundamentally against breakage. I am fundamentally and absolutely against breakage without lead time and warning.
I think we could all agree with that.
I think that one specific thing that would be help with Moritz’s goal is to get very early warning of breakage. While Moritz says that “head.hackage isn’t a solution”, I think we’d all agree that it is a help with getting early warnings. Simply by adding a “user-facing” tag to a MR, we can find out of a MR will break any package in head.hackage. That’s incredibly useful. But it would be even more more useful if we could (substantially) extend head.hackage’s coverage – and that is someone that everyone can help with.