Pre-HFTT: Ongoing focus on migration tooling / language features

I think the HF could benefit from some big overarching projects that:

  • On the one hand, have plenty of concrete small steps so we can actually make progress especially initially
  • On the other hand, offer a clear long term vision so they won’t “stall out” after early progress, so we can continue to rally ourselves around the vision and build real momentum.

And also:

  • On the one hand, directly address the needs of the larger institutions who are best able to fund the HF
  • On the other hand also address all sorts of long standing community concerns which affect the deep and shallow pockets alike.

Trying to cover the gamut on all 4 axes is no small task, but I think I’ve finally got something that fits in the form of migration tools.

This is sort of a rough initial draft I hope to edit based on feedback in the thread.

Problem

Institutions

Larger companies over time increasingly find themselves busy with managing existing code rather than writing new code. Migrations, versioning, etc. etc. increasingly sucks up more and more time. While there are various language agnostic methods (e.g. all the orgs that swear by monorepos), there is room for language tooling to help out too. Those don’t compete with the language-agnostic methods, but rather complement them.

I don’t think very many languages at all have very good stories around this. It is both an under-theorized and under-practiced area of work. If we can improve our situation, we not only make it easier for existing large users, but we make a nice reason for orgs that don’t currently use us to check us out. Unlike “Haskell for correctness” it is not a “high brow” use-case that is a certain reputational two-edged sword.

The community

Firstly, the community as a whole still faces many of these issues because even though we may individually be small, our collective commons in the form of Hackage and other libraries still makes us face many of the same issues. The GHC.X.Hackage proposal I would classify as migration tool for GHC devs and the community alike, for example.

Additionally, many long-standing controversies are wrapped up in some ideal-state-of-things versus instability-is-a-pain logjam. Clearing those log jams I hope will be very inspiring after years of pent up frustration and create a nice “long standing problem, we were stuck, HF refereed us across the finish line” narrative.

Solutions

There are certainly no shortage of things we can do! I listed a bunch in Stability / Migration tools · Issue #11 · haskellfoundation/stability · GitHub but not in any sort of feasibility order. As mentioned above, I would also put GHC.X.Hackage under this umbrella, and of course it is more ready since head.hackage already exists.

Deprecating reexports

I would put forth these on the short list of near-term initial tasks:

The former unlocks the main second steps of Deprecate partial functions from the Prelude, giving them new quarantined homes · Issue #70 · haskell/core-libraries-committee · GitHub which appears to be popular. One or both might with mtl, as pointed out in Deprecating _source_ of an imported module. · Discussion #489 · ghc-proposals/ghc-proposals · GitHub. The latter also helps with Cabal reexporting the new `Cabal-Syntax. I am sure there are more long standing issues we can/should accumulate if this is to become an actual HFTT.

Other things

As I wrote above, there is a lot more bullets in Stability / Migration tools · Issue #11 · haskellfoundation/stability · GitHub, do help prioritize them if this sounds worth pursuing!

5 Likes

I think you’re on to something here. But I want to think a little bigger. Instead of just having the HF focus on these smaller steps, maybe it could look at the amazing: an actual migration tool.

Specifically, the HF could midwife a GHC source plugin that migrates code from one version of GHC to another. With very few exceptions, GHC does not change in a way that invalidates previous parses. Even if it did, we could very likely continue to support parsing the old language only to error later. And, once the old code is parsed, then a source plugins can manipulate the source to its heart’s content. With GHC’s already-there exact-print feature, the manipulated source code could either be printed back to the user (so they can update their sources) or just processed by the rest of GHC (if the user does not want to update their sources, because, say, they are using an unmaintained library).

What’s beautiful here is that all the pieces already exist: the source-plugin architecture and the exact-print facility. Exact-print is even designed to allow manipulations to preserve user-written style choices. Thus I think this project is feasible. If done nicely (with a nice UI around it – this may be the hard part!), such a tool could become a killer app of Haskell and a strong part of our value proposition.

6 Likes

I do think “full automated migrations” is a right and proper end goal, but I am somewhat hesitant about, even with the functionality, how easy it is to correctly know what to do. e.g. writing on auto-eta thing for deep subsumption in general I think means getting an error, guessing, and starting over in a loop, due to the global ramifications of type checking.

So at this point I would feel more comfortable expressing the principles we want to end up with long term than the exact mechanism. I hope we can still build just as much excitement around that?

My best guess is we’ll get there by a mixture of how breakages occur and tools to mitigate them; by having proper deprecation windows there is a lot less “discovery” work that any migration tool needs to do .

Also, finally, remember this is both about language changes and library changes. The latter is I think especially interesting to companies because while they may not work on GHC and cause those breakages, they do probably change their own in-house upstream code in ways that might break in-house downstream code. Whereas folks might imagine (“Well I would just prefer if GHC broke my code less”), the need to deal with “self-inflicted” migrations cannot so easily be wished away.

4 Likes

I’m all for principles, but I don’t see these in your proposal stab, which looks like more small technical improvements. That’s great, too! I was just proposing a more ambitious technical goal.

I have to say, also, that I really like the way you’ve expanded my idea to apply to corporate code bases. That makes me think that the best way to attack this is to create some infrastructure around code migrations, where the actual code-change bit can be easily swapped out. The infrastructure might allow you to choose migrations, preview diffs, try compiling a patched file, etc. Then we GHC authors can make migrations for GHC changes, and other authors can make migrations that are appropriate to them. It all fits together.

To be clear, I’m not convinced that we could have such a tool always work for all changes GHC might think of. I don’t think that 100% coverage needs to be the goal. (I do think we could have made it work for simplified subsumption, but it would have been hard, or just maybe impossible.) 90% coverage would be 90% more than we have now.

5 Likes

I’m all for principles, but I don’t see these in your proposal stab, which looks like more small technical improvements. That’s great, too!

Yes I most certainly have not written the thing I said should exist yet :).

was just proposing a more ambitious technical goal.

And yes we should certainly have them, right now the proposal does run out of runway real quick, which is exactly what I said shouldn’t be the case!

I have to say, also, that I really like the way you’ve expanded my idea to apply to corporate code bases.

Thank you! It really puts a smile on my face to read that :slight_smile:

That makes me think that the best way to attack this is to create some infrastructure around code migrations, where the actual code-change bit can be easily swapped out. The infrastructure might allow you to choose migrations, preview diffs, try compiling a patched file, etc. Then we GHC authors can make migrations for GHC changes, and other authors can make migrations that are appropriate to them. It all fits together.

If I am understanding you correctly, I absolutely agree. Too often the things are care about, e.g.

  • References to declerations vs the declaration itself, so we can deprecate separately
  • The interface of a module, rather than the mix of interface, implementation, and other metadata that exists in an hi files, so we can compare them for PVP violations.

are more concepts in our head that concepts that exist in the code today. After the lower hanging fruit, I very much believe the story here is chugging away on infrastructure. The tools themselves for concrete migrations use-cases can, in comparison, basically be thrown together in a victory lap at the end.

2 Likes

We’ve done automated scripted migrations for smaller scale things at awake, and built up some domain knowledge and also some scars. My inclination is using the ghc api is fine and dandy, but shooting for a plugin seems like a bad idea – it puts more burden on an already not-quite-there plugin facility, and furthermore in our experience automation will usually only get you 90% there, so its much better to always have the manual step built into the basic workflow.

In general I don’t think we should consider a migration tool as something entirely unrelated to the existing infra we have in tooling. Migration could be an hls-style refactor, and we could perhaps introduce some easier command-line ways to drive that for those that don’t want the full singing dancing ide. I’d rather focus efforts on making our existing tools really powerful than building a bunch more partial tools.

3 Likes

I implemented the migration support for the Dhall ecosystem, which underwent several breaking changes to the language, so I have some experience with this.

The way it works in Dhall is that dhall provides a dhall lint command which migrates code from older idioms to newer idioms. With a few rare exceptions, there is always a migration window for at least one release where both the old code and new code work so that users have a smooth migration path. The idea is that a user should (in principle) always be able to upgrade their code to the latest dhall version by:

  • Upgrading to each dhall version one release at a time
  • After each release, running dhall lint to upgrade to any new idioms since the last release

If breaking changes are infrequent, a user can skip a few steps by only upgrading along the way to releases that include migrations, instead of upgrading to every single release in sequence.

This also implies that changes are never feature regressions, meaning that one should be able to do everything they could do with older releases, albeit possibly with different code. The exception is that something may be removed if the set of all removed functionality comprises unambiguous misfeatures.

I believe those are fairly common migration principles, so there’s nothing particularly novel there. Now onto the more Dhall-specific bits:

In Dhall, the migration tool is bundled with the interpreter (it’s a subcommand). That ensures that the migration tool and the interpreter are more likely to be in sync. You don’t want to get into a situation where a user applies a migration that is inappropriate for the version of the language they are using.

Another advantage of bundling the migration tool with the interpreter/compiler is that it’s just plain convenient for end users to migrate. No need to teach them how to install and use a plugin system for their projects.

Also, notice that the migration step for Dhall is called dhall lint and not dhall migrate. Why? Because the dhall lint step actually applies other useful and unambiguous improvements to the code. The reason we do it that way is that we want to train users to always run dhall lint on their code (even when they’re not interested in migrating the language), similar to how developers will auto-format their code as a matter of good hygiene.

Bundling the migration tool with the interpreter/compiler implies that the migration tool is not suitable for use with upgrading packages, since there is not a 1-to-1 correspondence between language versions and packages. For GHC there appears to be a convention of the language version corresponding with base versions (and other boot package versions), but I believe that should not necessarily be the case and the design of any migration tool for Haskell should not assume any such correspondence.

We also might not necessarily want to conflate a migration tool for the language with a migration tool for packages. It wasn’t clear from the above discussion whether or not people had in mind both types of migration tools or only a migration tool for the language. The two types of tools might potentially have different requirements.

7 Likes

From the perspective of an HFTP, I am having a hard time seeing which concrete actions you’re intending to propose that the HF undertake. I think that proposals are most useful when they do one of the following:

  • Seek community input for something with ecosystem-wide impact (the current proposal process is not well-suited to this goal, but I’d like to make it be so)
  • Seek support from the HF, whether it be in funding, time, or organization, to complete a project of benefit to the Haskell community
  • Seek to coordinate disparate actors from multiple projects to achieve something that we can’t do on our own

But this proposal is instead one that would give a big overarching project and a long-term vision for the HF. I’m not so sure that the HFTP process is the right way to do these. If we accept a proposal to focus on project X, what do we do then? Work on soliciting further proposals to achieve steps towards X?

The way I see it, the second and third types of proposals in the list above create a to-do list for the HF, from which we can prioritize items based on availability of resources and perceived benefit, but the proposal that I envision coming from this pre-proposal is more likely to suggest changes in a prioritization function for the to-do list than it is to add concrete actions to it.

I very much appreciate the focus on the institutional health of the HF. But at the end of the day, the mission matters more than the organization. I want to preserve the organization in order to carry out the mission, but the mission always comes first. I want to do a lot of useful things, and I hope that continuing to do useful things will help us get support, but the useful things are still the most important.

What about a revision that focuses more on concrete tasks that the HF can do, potentially after getting more resources?

1 Like

Yes perhaps I overemphasized the “good for the HF” parts. I do believe that, but this is still something I would like to see happen HF or no HF. It’s not, for example, intended to just make some otherwise-pointless adventure for the HF to look good undertaking :).

But this proposal is instead one that would give a big overarching project and a long-term vision for the HF. I’m not so sure that the HFTP process is the right way to do these. If we accept a proposal to focus on project X, what do we do then? Work on soliciting further proposals to achieve steps towards X?

The way I see it, the second and third types of proposals in the list above create a to-do list for the HF, from which we can prioritize items based on availability of resources and perceived benefit, but the proposal that I envision coming from this pre-proposal is more likely to suggest changes in a prioritization function for the to-do list than it is to add concrete actions to it.

Hmm this is interesting to consider. So I guess the first thing to address is I wouldn’t want to actually submit this proposal until the TODO list associated with it is fleshed out. Right now, it does indeed only have 1 concrete item, so I understand that it seem like the only possible purpose it has would be the prioritization function. But I indeed don’t want that to be the purpose.[1] I just want to “batch submit” a bunch of items at once.

The “shared story” here isn’t meant to force some items to the front of the queue, but just save me/us from having to motivate each specific TODO list item in isolation.

What about a revision that focuses more on concrete tasks that the HF can do

So yeah, exactly that before submitting, I think?

Any ongoing part would be, again, not prioritizing, but suggesting future proposals can appeal to this motivation to expedite their own enqueuing, so we aren’t just expediting the initial batch submit. But maybe it goes without saying that newer proposals can appeal to older ones.

potentially after getting more resources?

To be clear, I think you mean submit the tasks now, but they might fall behind other urgent items deeper in the queue than we have the budget for at present? That sounds good to me.


[1]: Before you started as ED I talked a lot about filling out priority queue vs ordering priority, and I am happy to leave the latter part to the HF executives especially as resourcing constraints (quantitative and especially qualitative) tip the scales.

1 Like