Language, library, and compiler stability (moved from GHC 9.6 Migration guide)

To me the rust nightly, beta, and stable are different compilers; I don’t think we need to go full fork. We already have this though, debug, profiled, and others are different compilers. Instead of an -fexperimental runtime flag, there could just be a different built with e.g. -DEXPERIMENTAL compile time flag; that compiler might not understand the same flags as the non-experimental compiler, know about more language features, …

I hope this cleared up any confusion as to what I meant.

1 Like

You wouldn’t be able to compile 95% of hackage.

2 Likes

It’s deprecated, but GHC still supports the -fglasgow-exts flag - maybe it could be remade into -fexperimental or -DEXPERIMENTAL

To elaborate on the Stability Working Group, I would say its most important function is facilitating communication and sharing information related to breaking changes, such as the cost of breaking changes, potential upcoming breaking changes and how we can prevent them or mitigate them. Representatives of GHC, Cabal and Stack attend.

I do not see the SWG (despite its name) as a way of getting work done. It’s not a group where we dole out work, perform it, come back next time for more, and publish a record of results. I wouldn’t expect the explicit lists of “tasks accomplished” to be particularly long. It’s one of those things where, if you’re doing your job right, no one knows you’ve done anything at all.

As someone who feels strongly about reducing the frequency and severity of breaking changes in the Haskell ecosystem, I would encourage like-minded others to join, because it’s an effective way of helping maintainers of critical ecosystem projects to learn about the costs of breakage and give them information that can help them reduce or mitigate it.

4 Likes

I would like to second @tomjaguarpaw here. The SWG has been a remarkably useful body thusfar due to the discussions that it has fostered that otherwise likely would not have happened. I appreciate each of the contributors who take time out of their day every two weeks to reflect on the status quo, how it may be improved, and work towards concrete solutions, even if progress may appear slow. The best way to change this is to come and contribute; we are all busy but many hands makes light work.

6 Likes

We took a first step in this direction in GHC #21475, which fell out of an SWG meeting and was implemented by @telser, the SWG chair. We discussed distinguishing “stable” from “less stable” extensions but ultimately were reluctant to do so as we struggled to find a definition of “stable” which would be both useful and accurate.

To pick a particular example, extensions like TypeFamilies are quite tricky; they are quite useful and therefore widely relied upon. However, they have no defined operational semantics; sadly, changes in these semantics can affect end-user programs. Sometimes these changes are merely reflected in compilation speed; more rarely they can change reduction before (e.g. where “stuck” families or UndecideableInstances are involved). The users guide is quite up-front about this semantic gap: there is an entire section explaining that reduction is driven heuristically.

Does this make TypeFamilies “unstable”? One could argue that the answer is “yes” and that it won’t be stable until we have a comprehensive formal definition of the extension’s behavior, both semantic and operational. However, this is probably okay: in most cases the utility of what we have likely outweighs the potential negative impact of such “instability”. Consequently, it’s hard to see the value in stamping an “unstable” label on the extension given how underdefined that term is.

TypeFamilies are merely one case of this. GHC is a composition of many research insights (things we Haskellers typically call “language extensions”) by many people. Consequently, we often find ourselves at the edge of human-kind’s understanding of our craft. Sometimes we are aware that there are things we don’t know; in other cases we don’t even know what we don’t know. This is why we are reluctant to call “extensions” stable unless we have a fairly comprehensive theory (semantic and operational) accounting for the extension and its interactions with related extensions.

9 Likes

For the record these posts were originally made on the GHC 9.6 migration guide thread and have been moved here as they discuss stability more generally.

I think this discussion points me to a different conclusion. GHC has at times really highlighted that features are in some way experimental or have rough edges, and certainly people have over time encountered those rough edges themselves and learned the hard way. I think the issue is that many of the developers of libraries in Haskell, including real intended-for-production libraries are by nature early adopters who are very excited about getting and using all the new features, and rush to try them out.

So I doubt that putting these features behind a further experimental or -f-i-solemnly-swear-i-invoke-experimental-features flag or whatever would make much difference – the same folks that enthusiastically rush to make use of InvertedGeneralizedFamilyRecordTypes or whatever will still rush to do so even with more signposting.

(And honestly, this is something I tend to think is a great feature of the Haskell world, culturally, and I don’t necessarily wish to discourage it – it is just that when one is at the edges of what is new, then there is necessarily going to be more maintenance as those things develop and firm up).

4 Likes

I think the issue here would be that the vast majority of Hackage would have -fexperimental turned on anyway. It’s the nature of the Haskell community at large to want new shiny things. Of course it might be useful to have libraries with huge sets of reverse dependendies to opt out of ‘experimental’ features (e.g. aeson, bytestring, text), but AFAICT those don’t tend to be the ones that break anyway when a new version of GHC is out. (Well, they might break internally, but nothing that forces a change in the external API.)

We can see this writ large with how macros in Scala 2.x played out. They were very clearly marked experimental, but they were just so incredibly useful (in spite of all the flaws) that libraries that are used nearly everywhere came to fundamentally rely on them. Thus leaving a huge porting effort and or rewriting when migrating from Scala 2.x to Scala 3.x. I don’t see Haskell being much different in that regard.

(I don’t have any solutions to offer, I’m afraid. It’s an extremely hard problem. I do think industrial users could make a significant contribution here if they were so inclined by throwing more money at the problem, tho.)

3 Likes

This might be true. However if the compiler would reject those packages as experimental (if told not to use experimental code), policies may make those packages simply not eligible for use in production in some places.

Enforcing a policy that says the codebase (including dependencies) must compile with -fno-experimental (ideally a stable compiler, and not a runtime flag), would be fairly easy for “industrial users”. Especially if that comes with a guarantee that there is no breakage.

I am starting to have seriously doubts about this.


I really hope choosing Haskell doesn’t mean: either you accept eternal breakage, write every piece of library code yourself, or use hackage.

I also have serious doubts that many maintainers enjoy the constant churn to make their code compile with the latest GHC release (and even if this means just figuring out which of the libraries in their dependencies broke and needs to be updated now; and this a new release needs to be cut).

I am skeptical that this is a viable option. As I say above, it’s not at all clear (to me, at least) what “experimental” means. There is a continuum of possible degrees of stability and historically Haskellers have been eager to use features which lie pretty far from the “stable” side of this continuum.

To continue with the TypeFamilies example used above: would you give up use of Generics, servant, lens, and vector just to gain an small degree of (likely only theoretical) “stability”? I suspect that most commercial users would not.

@angerman, if you have a list handy, it would be great to know which compiler/language changes have been the largest offenders in terms of upgrade cost. I don’t doubt that there are things we could do here, but it’s hard to know what they are without making the discussion a bit more concrete.

2 Likes

I doubt anyone wants to prevent Haskell pushing the boundaries of what we know.

This is not the stable/experimental distinction I would make. Stable would mean: we consider this feature to add enough value to commit to putting effort into proper deprecation noticed and cycles when it becomes necessary to make changes.

Again, I am not against changing and breaking stuff as long as if comes with proper deprecation warning and lead times. Stable does not necessarily have to mean: this will never change in all eternity. It (to me) just means: this feature is ready for production and changes to it will be preceded by deprecation warnings.

6 Likes

This is not the stable/experimental distinction I would make. Stable would mean: we consider this feature to add enough value to commit to putting effort into proper deprecation noticed and cycles when it becomes necessary to make changes.

The problem is that we need to have a specification to say what “change” means. Many of the language extensions in common use today have no such specification; in many cases this is precisely why they haven’t been folded into the Haskell Report.

To take the GADT record example above, the previous behavior was neither intended nor documented. It was rather an emergent behavior resulting from the interaction of one loosely-specified feature (GADT record update) with another loosely-precisely-specified feature (type families). The specification we have, the Haskell Report, stipulated that programs of the form mentioned in the migration guide should be rejected, making the fact that they were previously accepted a bug.

5 Likes

To elaborate a bit here:

Would it have been technically possible to continue accepting these programs (presumably with a warning and deprecation cycle)? Very likely, yes.

However, we are very reluctant to start letting implementation bugs supercede specification. Afterall, what if, when adding logic to accept these programs we introduced yet another bug. Are we also now obligated to emulate this bug, with a deprecation cycle? Where does this end? How do we manage the technical complexity of validating these layers of bug-emulations? Avoiding this is why we specify behavior and reserve the right to change behavior which is not specified (e.g. the GADT update behavior noted in the migration guide).

Now, this is not to say that we will never add deprecation cycles around standards-non-compliant behavior. If the user-facing impact were high enough in this case, we could have done so here. However, we judged (using empirical evidence provided by head.hackage) that this particular issue was of sufficiently low impact that users would be better served by our time being spent elsewhere. As with most things in engineering, this is a trade-off that must be struck on a case-by-case basis.

6 Likes

I don’t follow. GHC is pretty much the de-facto Haskell compiler. We all write GHC Haskell today. I doubt many read the spec and write code according to spec (if there is one) to be surprised that GHC doesn’t accept their code. We write code to be accepted by GHC. Thus if GHC starts rejecting code, whether or not that is technically a bug, it is a user facing change.

head.hackage is pretty much the database of likely known “changes”.

Anything that prevents a codebase compiling (warning free and hypothetical -fno-experimental) with GHC N-2, and fails to compile with GHC N-1 or GHC N is a change.

I think that collecting examples of these would be one of the highest-value activities we could do around improving the stability story.

3 Likes

The role of a specification is not solely to inform users of what behavior to expect; it is equally (if not more) important as a tool for implementators to judge what behavior is correct. Without a specification, it is impossible to say whether a particular behavior is self-consistent, interacts poorly with other behaviors, or is implemented correctly.

This is why language extensions tend to change: we cannot predict all interactions without a thorough specification (and even with one it is quite hard). Consequently, we may later notice (often by chance) an internal inconsistency. We generally try to fix such inconsistencies by appealing to whatever existing specification applies since even under ideal conditions, developing a language implementation is hard; doing so without a clear specification is significantly harder

Yes, this is why I’m asking for concrete examples; when I look at head.hackage I see very few changes that stem from language or compiler changes. The vast majority of patches are due to:

  • changes in base
  • changes in template-haskell
  • changes in the boot libraries

Changes in the surface language itself are quite rare. The two notable exceptions that I can think of are:

  1. simplified subsumption, which we reverted
  2. 9.2’s change in representation of boxed sized integer types, which was sadly necessary for correctness on AArch64/Darwin.

As @tomjaguarpaw says, I think starting to collect a list of high-cost past changes would be an incredibly useful exercise. We can’t know how to improve until we know what we have done poorly in the past. Certainly the examples above should be on this list but surely there are others as well.

7 Likes

I know I’m a tiny minority, but I try to write code that will be accepted by either GHC or Hugs. This includes features way beyond what’s documented in Haskell 2010.

Yeah would be nice: Overlapping instances is poorly spec’d – especially how it interacts with FunDeps. There’s significant differences between what code the two compilers accept. GHC is too liberal/you can get incoherence; Hugs is too strict; nowhere explains the differences or why.

1 Like

Let me suggest an extreme example, as a thought experiment. Suppose that a new feature, say, FrobnicateTypes, allowed users to write unsafeCoerce. And now suppose a check were added to GHC to disallow precisely those cases where users were writing unsafeCoerce. Would we want a deprecation cycle specifically for people who made use of what seems to me an evident soundness bug in the feature? I think we would not, and that suggests to me that there’s a continuum between “bug-like” and “unintended-feature-like” which requires a little care in evaluating in each specific case.

2 Likes

Should this warn loudly? Yes!
Should this abruptly break code? No.

1 Like