Dependency version bounds are a lie

…at least more often than not, if their goal is to say that “my code works with the versions in that bounds”.

How things can go wrong

  1. You start a new project wombat, and add all the currently used versions of its dependencies (here: foo-1.2.3) to the build-depends line of the wombat.cabal file, maybe like this:

    build-depends: foo >= 1.2.3 && < 1.3,
    

    You set up a nice CI system, maybe testing various versions of GHC.

    All is well.

  2. To get extra cookie points you maybe subscribe to a packdeps RSS feed for our package and get notified when foo-1.3 gets released. So you change the dependency to

    build-depends: foo >= 1.2.3 && < 1.4,
    

    test that it still works, and release a new version of wombat.

    Still all is well.

  3. A few weeks later, you add New Shiny Feature to wombat. Your tests all pass, CI is happy, and you release a new version.

    Is still all well? We don’t know!

    You package claims to work with foo-1.2.3, but if your latest feature happens to be using code only available in foo-1.3, this is now a lie, and there are no checks in place that check that. On CI, cabal prefers building with the newest version possible, and always builds with foo-1.3.

Can affect upper bounds too!

Ping by packdeps, I just updated the few upper bounds of transformers to allow 0.6. But transformers is a boot package (comes with GHC, installed by default), so cabal keeps using 0.5.6.2.

This led to a incompatiblity between mtl-2.3.1, transformers-0.6 and GHC 9.0 being released without the CI system complaining.

(For the upper-version-untested-problem there is a maybe a simpler fix possible; in a way what I propose below is a generalization of that.)

The underlying problem

In both cases, the problem is that the package metadata specifies a (possibly) wide ranges of compatible packages, but the CI systems only test a small subset of these versions. As a rule of thumb, if it’s not CI tested, it breaks, so this is a problem.

The goal of version bounds

Stepping back a bit, I wonder: What is the main purpose of the version bounds in the cabal file?

It seems to be to indicate which dependency versions are expected to build, so that downstream users are not bothered with build failures of their dependencies (instead, cabal tries to pick different versions, or reports earlier with the inability to form a build plan).

Cabal’s 3.0 caret-set-syntax of specifying bounds, where the equivalent to the above build-depends would be

build-depends: foo ^>= { 1.2.3, 1.3 }

emphasizes this purpose.

(There is also the use of bounds to indicate semantic incompatibilities.)

Let’s avoid lies!

So it becomes clear: If we want to avoid our cabal files from lying, we need to have the package CI check every version allowed by the version range. Or, a bit more realistic, those versions that (by the PVP) imply compatibility with all versions. In the case of caret-set-depends, it’s simply the versions in the set.

But how?

What’s not clear to me is how to achieve that.

We could generate one Job for each dependency/version pair. Ideally dynamically, because we probably don’t want to edit/generate the .github/workflows file whenever we edit the .cabal file.

But some packages come with GHC are not reinstallable (e.g. base), so when generating the job definition, we’d also need to know which GHC version can be used to test that package.

And this would lead to highly redundant builds once we have more than one dependency, as the jobs we generate for the versions of dependency foo also build against some version of bar, so having separate jobs for bar is wasteful. (Users of nix with the haskell.nix infrastructure and a good nix-aware CI system like Hercules CI might get away with it, though).

Many freeze files?

Maybe this scheme would work:

  1. Think about why you even want to keep supporting an older version of a dependency. Typical reasons are, I think

    • You want to support building with an older version of GHC, and the dependency comes with GHc.
    • You want to allow users of certain package sets (stackage, nixpkgs, Debian) to use your library without having to also upgrade these packages.
  2. From this, you can derive a policy listing version set targets, e.g.

    • latest stackage releases covering the latest 3 GHC versions
    • nixpkgs unstable and stable,
    • latest versions on hackage with latest released GHC
    • GHC HEAD with head.hackage
  3. Find a source for a cabal freeze file for each of these target.

    For stackage, such files are provided (e.g. https://www.stackage.org/lts-17.13/cabal.config). For the others, let’s assume similar services exist, or tools that generate them on the fly.

  4. For each of these freeze files, have a CI that uses it. This guarantees that your code keeps continues to be tested with the older versions of its dependencies.

  5. Have a CI job that checks that the bounds in the .cabal file are nothing but lines of the form

    build-depends: foo ^>= { 1.2.3, 1.3 }
    

    where the versions are those in the freeze file, and complains if they aren’t or – better – fixes them.

    Alternatively, and more elegantly, don’t keep version bounds in the .cabal file in .git and only add them, based on the tested freeze files, upon upload to hackage.

Collateral benefit: More automation

It seems that with this scheme, one does not only avoid lies in the .cabal file, but moreover the tedious maintenance of this data is simplified: One now has to manage these version sets sources as a whole, and the individual entries are derived from that.

I would imagine that the package sets are pinned (either simply vendored to the repo, or referenced in an immutable way) and an automated process regularly updates them as hackage/stackage/nixpkgs progresses.

With that in place, keeping your version bounds up-to-date and fresh may become merely a matter of checking and accepting automated PRs (which could include the relevant changelog entries from the updated dependencies in the PR description!) and making releases (itself automatable).

Does that make sense?

(Yes, there is still the lie that with multiple dependencies, such a .cabal file will promise that all combinations of dependency versions ought to work, despite this not being tested on CI. But one step at a time.)

12 Likes

How about a single CI run with every package pinned to its minimum bound? It should be possible to write tooling to do that generically. I think if something works with the minimum and the maximum it’s fairly likely it’ll work with the intermediate versions too.

I can imagine there could be theoretical edge cases where you need (A=minA, B=minB+1) OR (A=minA+1, B=minB) but hopefully they’re unlikely in practice.

FWIW I did try to do some of this manually for HTTP for a while, but gave up, it was too much effort to maintain.

5 Likes

Yeah I think a convexity convention for PVP would improve the current situation. Specifically it would say that no feature may be deprecated and later added again under the same name.

But the problem of an exponential amount of CI jobs remains if you want to be completely sure every combination of dependencies works. Because we can never be sure that an old version of one dependency works together with a new version of another dependency.

One way to tackle that is to split unrelated dependencies over a bunch of private libraries in your cabal package, but that sounds like a huge hassle.

That’s an approximation to my idea, yes. Still leaves the lie in the “fairly likely” provision – but definitely better than the status quo. Flag to force the use of the latest allowed package · Issue #8387 · haskell/cabal · GitHub is about one half of that idea.

I’ve worked on non-Haskell projects which attempted to do lots of automation in a CI pipeline. This had the effect that any flakiness in complex tests (which may be less of a problem in Haskell?) was amplified such that a large proportion of the team’s efforts were swallowed up improving tests and making the CI pipeline more robust. Even then, the likely carbon emissions of making a single change was a concern.

So I feel like the solution must involve reducing the number of combinations of dependencies that need to be tested.

4 Likes

Neil Mitchell has an interesting blog post re cabal lower bounds: Neil Mitchell's Blog (Haskell etc): Adding Package Lower-bounds

tl;dr: he hacked cabal-install solver to to generate build plans that favor lower versions, so to test lower bounds declared in cabal metadata. I like this idea; it would be nice to support this in upstream cabal-install, and people could test if there lower bounds are a lie by adding a CI job for it.

6 Likes

I’ve recently implemented this feature in Cabal, --prefer-oldest. It is due to be released in Cabal 3.10.

20 Likes

Was just about to post that too :slight_smile:. Thanks for that contribution! I hope haskell-ci picks it up so that many people benefit from it.

Note, though, that its --prefer-oldest, and not --force-oldest so it doesn’t guarantee that the lowest bound is actually exercised (just like with the default --prefer-latest and the upper bound).

Good observation. This reinforces my longstanding belief that bounds don’t belong to packages but to package collections (either personally-curated ones, or pinned ecosystem snapshots like Stackage, or even repositories like Hackage).

2 Likes

I’ve played with --prefer-oldest for couple of months by now and generally quite happy: it’s not bulletproof, but benefit-cost ratio is very good. One caveat is that our ecosystem indeed is very prone to the issue described above, so builds with --prefer-oldest are likely to fail not because of your bounds, but because of transitive dependencies. If you try --prefer-oldest and your build fails in a package outside of your control, I strongly encourage to report the issue to GitHub - haskell-infra/hackage-trustees: Issue tracker for Hackage maintainance and trustee operations.

1 Like

One restrictive way to solve this is to use exact major versions, foolib == x.y, so if you and foolib adhere to PVP, there can be no breakage. The prominent counterexamples being base and other boot packages, which still should have a version range, and the whole version range should be tested on CI by selecting different GHC versions.

I used to do that in the past, but switched to foolib >= x.y mostly because I didn’t want to maintain restrictive upper bounds.

That is a solution to this problem (and maybe a good one for leaf package, although there it’s probably not better than having no bonds and using freeze files or other means for pinning dependencies). But I’d expect it to cause headaches to those building package sets (stackage, nixpkgs, Debian).

Or do you mean == x.y.*?

That’s a bit better, but then better use ^>= x.y.z so that you get the correct lower bound (after all, maybe you want to use a feature from foo-1.2.3 that’s not in foo-1.2.1). But now again you have to deal with CI likely only testing the upper edge of the range (unless you use --prefer-lower, and even that is no guarantee).

I test lower dependency version bounds in my packages using Stack.

My goal is to support five years of GHC, Cabal, and boot libraries in publicly released projects. While this is not always possible, most of my public packages currently test against the latest revision of all major versions of GHC since 8.2.2 (released 2017-07-22). I test using both Cabal and Stack, and my lower dependency version bounds are determined by the versions included in the oldest tested Stack snapshot. The lower bounds are therefore tested in CI with every push.

I just realized that I have inadvertently made an exception for base. I will update my packages to resolve this.

The (official?) policy of supporting three major GHC versions only provides about 2~2.5 years of support. Based on my experience in industry, this is uncomfortably insufficient, IMHO.

The drawback of the “five years” policy is that I am unable to use some new features in projects for which compatibility is a priority. For example, I am still using Cabal 1.24 features. Cabal 3 was released on 2019-08-27, so I will not use those features until late 2024.

Here are some examples. I link to the develop branches because the main branches are stale due to the policy of creating Hackage revisions when changing dependency version bounds instead of creating proper releases. (I am not a fan of this policy because it results in the tagged release in the repo not matching the Hackage state. Sometimes people want to fetch the source using Git, and such discrepancies cause issues.)

Simple example:
https://github.com/ExtremaIS/ttc-haskell/tree/develop

Example with many cabal.project files:
https://github.com/ExtremaIS/queue-sheet-haskell/tree/develop

This second example requires many cabal.project files because an upstream dependency does not properly constrain dependency version bounds. I have to pin appropriate versions for each version of GHC that is supported. This works with my own infrastructure (Makefile and CI), but it does not work in Hackage, where documentation fails to build.

1 Like

This is nice, but afaik --prefer-oldest will not guarantee a build plan with the oldest bounds. So you’d have to inspect the resolution (plan.json) to understand whether there are unreachable versions.

That’s why I also think we want the ability to drop the lowest bounds of all dependencies into a freeze file and do so with cabal freeze.

Very nice indeed, and quite close to what I have in mind. I guess what’s missing is a mechanized way of tying the version bounds in the .cabal file to those pinned by the project files.

For example, .cabal allows ginger >=0.7.3 but none of the .project files seem to pin that version. Is it really tested on CI?

By the way, these days you can generate the job matrix dynamically, so you could generate them from the .project files present in the repo and not worry about forgetting to add them to the GitHub actions file.

1 Like

For example, .cabal allows ginger >=0.7.3 but none of the .project files seem to pin that version. Is it really tested on CI?

That is a mistake! I used to test that version, but the lower bound was not updated when I had to reconfigure to get things building again after Aeson 2 was released. I am accustomed to updating bounds when dropping/adding support, but I failed to do so in this other case. This is a good example of why tooling to help test bounds would be very beneficial!

Thank you for pointing it out. I have started my audit of lower version bounds in all my projects, but have not gotten to this project yet. I shall fix it soon. Thanks to this discussion, I will also be more careful about lower bounds in the future.

By the way, these days you can generate the job matrix dynamically, so you could generate them from the .project files present in the repo and not worry about forgetting to add them to the GitHub actions file.

Interesting! I have not seen this before. Searching, I have found many tutorials online, and I look forward to improving my CI configuration. Thank you!

1 Like

I think you are almost where I think I’d like to be. What I’d may attempt to do here is the following:

  • From each of your CI jobs, collect the versions of packages used (e.g. using the build plan file).

  • In a subsequent CI job, collect all of these plans, and mechanically update the .cabal file so that the bounds are derived from the versions in these plans.

    It’d be like cabal gen-bounds, but taking multiple builds plans as input. I reported cabal gen-bounds: Consume (multiple) build plans? · Issue #8654 · haskell/cabal · GitHub to see if cabal gen-bounds could do that.

    Or maybe Oleg Grenus’s cabal-fmt code could be easily used to transform the .cabal files appropriately (if one is happy with it’s formatting choices.) – ah, not intended for that. But probably doable using the Cabal-syntax package directly.

  • If that changes the .cabal file, automatically push the change (This way, you never have to manually edit the .cabal files; just edit or add .project files and CI takes care of it). If it doesn’t change the .cabal file all is good and CI turns green.

Yes, but it has the diamond problem for “non-leaf packages”. (That’s great terminology, thanks.)

A freeze file also pins the minor version, but I’d usually want to have these free to get bugfixes automatically.

I don’t know about Debian, but I thought that in stackage the maintenance burden is mainly on the package maintainer? (And likewise in nixpkgs, which reuse stackage snapshots.) If I remember correctly, my packages got thrown out of stackage every now and then because of the restrictive bounds.

Yes, I guess I mean that. Although in the past the ^>= operator for some reason has never worked for me as I wanted it, but that may be just me.

I guess that happens when the headaches get too big, and your package might stay in if your bounds were more liberal :slight_smile:

1 Like

How about approaching this from a more programmatic side.
Projects like policeman have shown that we can (in theory) programmatically compare APIs using HIE files.
Then why not extend this approach? What I am thinking is the following:

  • Compile your project and produce HIE files
  • Create a table of symbols we use from dependencies together with their respective signatures
  • Compile every dependency version we claim to be compatible with, produce HIE files and extract the public API of each respective package.
  • Now compare the API we use for each package and version and check for inconsistencies.

It feels like a good direction to make Haskell package metadata (like public API) easier to consume for tooling.

I think advantages are quite obvious, we could even remove strain from CI systems by having a public read-only storage for HIE files (or only APIs) for each package on Hackage based on platform, GHC version, cabal flags, etc…

Some possible disadvantages or even complete blockers for this idea:

  • Only works with recent enough GHC versions (not a blocker, imho).
  • This tool would have to chase HIE file format changes, but I think that should be manageable.
  • Haskell has quite the huge surface syntax, the risk of missing a construct and claiming compatibility while it actually isn’t, is quite high.
  • Extracting comparable symbol tables might always be an uphill battle against GHC.
  • Maybe straight up not feasible without running the GHC typechecker.

What do you think? It seems like such a straight-forward idea, that I can’t imagine no one has ever tried it, so I may have missed blockers.

This doesn’t take care of semantic changes to functions with no changed API, but I don’t think the approaches mentioned above will take care of that either.

3 Likes