Thanks for the explanations.
Just to be clear… my comment wasn’t an attempt to discredit the release work, but to reflect on the incident as a whole.
From the distributors point of view, my concerns are as follows:
- the exposure time was 3 months (for everyone installing
latest
GHC)- did end users actually receive the warning?
- how can distributors deal with these situations best
- there’s a lot to be desired on the GHCup side of this
My perception of the issue was that it could have been fixed faster. I think one of the reasons that it wasn’t is that GHC does not have a process to expedite bugfixes/releases. It is hard to do a swift critical bugfix release if there is no process for it and it has to be squeezed into the rest of the daily work, other releases etc.
An expedited process is very standard in high-risk industry. I can’t really say what the best way would be to implement something like that for GHC, but a few common patterns I’ve seen are:
- all other release work stops (at least if resources are shared)
- no other patches are considered… the scope of the bugfix release is only that one fix (that seems to have been the case here)
- normal lead times (e.g. for release candidates) don’t apply
- certain parts of the release process may be skipped or accelerated
It was also discussed that the perceived exposure was low, because it was a “.1” release, not warranting an excessive expedited process. While this makes sense for power users, it’s a bit problematic to make such assumptions in general… we could go so far to exclude .1 releases entirely from GHCup, if they are deemed experimental from upstream itself (or expose them only from the vanilla channel).
I think in the past there have been cases where very critical bugfixes would have benefited from an expedited process (such as correctness issues on the new M1 backend). So I want to kick off this discussion gently.