The ^>= operator in .cabal files

Well, there are many reasons:

  1. It breaks an important invariant that package tarballs are self-contained (this matters for distros, tooling, mirrors, scripts, …). Interestingly… semver demands that this is true: “Once a versioned package has been released, the contents of that version MUST NOT be modified. Any modifications MUST be released as a new version.”. Instead… we’ve successfully converted a set of package tarballs into something you can only query correctly via an API that’s specific to the hackage infrastructure (the hackage index format is also ad-hoc and “internal” btw.).
  2. It burns out hackage trustees: this is easy to see from the interaction I’ve had with them, the turnover and the amount of panic my PVP PRs have caused.
  3. It gives hackage trustees “backdoor access” to anyones package metadata. Some users might not like that and it requires no rigorous process like the NMU.
  4. It’s a “let it break, then fix it” approach. But the fix is tighter constraints usually and not a code patch that improves compatibility.
  5. Hackage revisions can break freeze files that don’t pin the index. This has happened before, when a revision caused tighter upper bounds than necessary.
  6. There’s no automation about it at all. Everything relies on explicit communication and people doing manual labor.
2 Likes

The front page of Hackage says that packages may opt-out of curation, and provides a link to what curation means, including that trustees may help by revising metadata. The word “backdoor” carries the wrong connotation here, IMHO.

1 Like

The devil is in the details. The tarball is immutable and self-contained (and cryptographically signed). The hash of ghcup-0.1.19.2 hosted at https://hackage.haskell.org/package/ghcup-0.1.19.2/ghcup-0.1.19.2.tar.gz is

b25a15adaaca30a227ed12560d1d89924d9d3ae17fc5798f20f2f00484866088

and that will not ever change.

The “curation” part is a separate “service” built on top of the regular tarballs. Your use of the word converted is misleading, the tarballs are there. This is an interesting thread about the role of package metadata:

It is (kinda) documented and there are few packages to access it.

What’s ad-hoc in your definition? Is ghcup metadata format ad-hoc?

Techologically speaking, I agree the index format is a bit … “not-fancy”. I invite anyone interested in a re-design to open a thread to discuss. Personally I’d like take some ideas taken from stackage’s pantry. I once tried to define a canonical conversion to git (to subsume a half doznen ad-hoc options you can find on GitHub) but I never managed to finish running it because there are 171k entries in the index and my Python script could not cope :joy:

Burning out doesn’t seem to be exclusive to trustees, I bet they burn out just like everyone else :see_no_evil:

What breaks? My definition of breaking is when a package stops to compile. Speculative upper bound prevent that and relaxing them after manual verification is definitely not a "let it break, then fix it” approach.

We furiously agree here. Better tooling and more automation is sorely needed. I can do little but I know I am doing it.

2 Likes

I think I didn’t get the point across: the curation (which may be dire needed) depends on the infrastructure used, instead of correctly uploading a new version… now everyone downloading only the tarball may not get the fixed cabal files.

Uploading a proper new version is what really every other ecosystem does. It works well.

A package. It may stop to compile. With the current model… that’s what happens and then hackage trustees come along and fix it, but not the code, but the bounds.

I’m puzzled how this is deemed a good state of affairs.

Apologies… I digged it out and it’s actually the index cache that’s ad-hoc and implementation-defined.

Yes. There’s no schema.

1 Like

Just 17 more revision and cabal install hoogle works now :wink:

1 Like

Back to the topic, my preference would be to scrap this special semantics of ^>= and make ^>= X.Y a precise equivalent to >= X.Y && < X.Y+1. Version bounds are already hard, and introducing another layer of difficulty does not actually help anyone.

  • If I put a hard bound foo < X.Y and foo-X.Y appears to be incompatible, we are all set already. Otherwise if foo-X.Y appears to be compatible with my project, I can make a revision.

  • If I did not put an upper bound at all (YOLO) and foo-X.Y remains compatible, we can carry on. If it breaks things, alright, let’s slap a revision. It can save some work if foo is extra stable.

  • But if I used a soft bound foo ^>= X.Y-1, I must make a revision in both cases. Either foo-X.Y is compatible, in which case I should write foo ^>= X.Y-1 || ^>= X.Y; or it’s incompatible, in which case I must put foo >= X.Y-1 && < X.Y, because now I know that this is not a speculative bound. Guaranteed busywork in both scenarios!

See the similar sentiment in Drop the requirement of specifying upper bounds by hasufell · Pull Request #51 · haskell/pvp · GitHub.

4 Likes

I think that there should be a way to specify the intended meaning of caret such that this does not require a revision. There’s a monotonicity property we can use to guide this.

We are slowly moving into this direction: cabal check has recently started to warn on misssing upper bounds.

Those things are fine if there’s automation behind it (e.g. CI) and possibly a user interface that lets me click which bounds to update (not edit a 500LOC cabal file by hand in an online editor from the 90s).

Given that automation and user interface… not specifying defensive upper bounds might work the same way as well.

FWIW cabal should warn about this, but this should never become a hard failure during hackage uploads, if only because it’s trivial to bypass it e.g. by setting missing upper bounds to < 10000.

1 Like