GHC and Cabal: the big picture

Friends

You may remember a recent thread on ghc-devs about GHC and Cabal. In it I say how I feel I lack the “big picture” of how GHC and Cabal interact, and that my mental model is probably faulty.

Tom Ellis took pity on me, and together we wrote this big-picture overview about how GHC and Cabal interact. Would you like to:

  • Read it as a consumer.
    • Does it tell you stuff that is useful?
    • What else would you like to know?
    • What is un-clear or missing?
  • Read it as an expert.
    • Is it accurate?
    • Are any bits misleading?
    • Do the links go to appropriate places?
    • What other links or resource would be helpful.

It is not intended as a replacement for the GHC user guide, nor the Cabal user guide; rather it is littered with links to those guides which give much fuller details. Rather, it is intended to put you (well, me for one!) in a position where you can more easily make sense of those documents.

We’d love to have your help in improving it.

Simon

31 Likes

There appear to be many places in cabal that have intimate knowledge about GHC, leading to poor boundaries between the projects.

Two such examples are:

So what I’d like to know:

  1. why does this coupling exist?
  2. how do we get rid of it?
3 Likes

I concur, and funnily enough was about open a thread about the boundaries between our core tooling before seeing this topic.

There are things that GHC knows too much about despite belonging to the area of package management (for which we have cabal) or documentation (for which we have haddock). I understand GHC was created before cabal but we ought to enshrine Cabal (not necessarily cabal-install) as the package manager for Haskell projects, with clearly defined roles and interactions that are not only driven by historical needs.

Edit: While vendoring a data structure from cabal, GHC ended up with a very particular field in the GenericUnitInfo record… See ghc/ghc#25073

7 Likes

Thanks for writing this up!

Where will this document live once it has been finalised?

I feel like this overlaps a bit with the document, but Duncan Coutt’s talk about the packaging ecosystem is a great resource to link https://www.youtube.com/watch?v=XfTinQPjDQw

7 Likes

One aspect of this investigation that I found interesting is that the -package-db, -package-id and -package flags already are a sort of proto package manager. They’re the kinds of flags you’d integrate into your compiler if you want the very raw basic functionality of being able to switch between different versions of packages but you don’t have a full package manager, like cabal, available. I suspect that’s how those flags arose originally. I suspect we ought not to use them now (and I think cabal indeed doesn’t, except in the very most basic way, to expose a package DB it creates, and to enable the relevant packages from it).

[I know “cabal is not a package manager” but unfortunately I can’t think of a better description at the moment.]

4 Likes

see also: #25025: GHC and Cabal disagree on the meaning of the term "package" · Issues · Glasgow Haskell Compiler / GHC · GitLab

It is touched on in the document, but I think GHC’s -package <pkg> is currently broken, because it can expose a unit that ghc-pkg list <pkg> does not list. That is because -package <pkg> treats the package-name of a unit (when it exists) as if it were the name of the unit (when, in fact, it is the name of the Cabal package that provided the library that corresponds to the unit). @catachresthetic has just cross-referenced to the related GHC issue.

3 Likes

Perhaps the original Cabal design document can be of some use here:

…both to be a point of comparison to this particular “big picture” and to identify the points where the current Cabal implementation are too GHC-centric.


If you mean cabal, the tool - it was. But apparently the UX was so dreadful that this happened:

So have all the problems with cabal listed there been resolved or reduced?

2 Likes

I’d say: almost. The only problem that’s really still there is problem 1:

that the tool facilitates combining sets of packages to build new applications, not fail without pointing to the solution, just because packages advertize conservative bounds on their dependencies;

Cabal is still bad at telling you what the problem is when it fails to find a valid build plan (let alone suggest how to solve the problem).

Using stackage snapshots is one solution to that problem that is now also possible, but it does not quite match stack’s UX and I still wouldn’t recommend it to beginners (who’d need it most).

3 Likes

Let’s see:

multi-package project support (build all packages in one go, test all packages in one go…)

:white_check_mark:

depend on experimental and unpublished packages directly, stored in Git repositories, not just Hackage and the local filesystem,

:white_check_mark:

transparently install the correct version of GHC automatically so that you don’t have to (and multiple concurrently installed GHC versions work just fine)

We rely on ghcup to do this

optionally use Docker for bullet-proof isolation of all system resources and deploying full, self-contained Haskell components as microservices.

Automagic builds in docker containers are nice, perhaps not critical for adoption.

4 Likes

It is not intended as a replacement for the GHC user guide, nor the Cabal user guide

Ok, but is there any hope of a new revision of the Cabal ‘proposal’? The proposal is also neither of those two documents. Pinning down the details of how GHC/Cabal interaction is supposed to work sounds like a specification to me. What is the effective difference between the proposal and ‘the big picture’?

1 Like

May I clarify, is the document intended to describe the relationship between GHC and Cabal (the library) or (and/or?) the relationship between GHC and Cabal (the tool) (the executable provided by the cabal-install package?

As I understand it, both Stack and Cabal (the tool) are built on top of Cabal (the library). By default (it can be changed by using a Custom build type), each version of Stack (the ‘official’ binary distribution) builds using the version of Cabal (the library) that ships with the specified version of GHC as a boot package. I understand (perhaps wrongly) that each version of Cabal (the tool) builds with a single version of Cabal (the library), the one specified as a dependency when the tool is built, irrespective of the choice of GHC version. EDIT: So, for Stack, it assumes that there is a close relationship between (a) a specific version of GHC and (b) the Cabal boot package for that version of GHC.

EDIT2: Historically, Stack has built ‘packages’ (made up of components) rather than ‘components’ (provided by packages) - the latter being known as ‘component-based builds’. People (particularly, in recent years, @theobat) have been trying to adapt Stack to component-based builds. As I understand it, the current sticking point is ‘performance’ - Cabal (the library), accessed in the way set out in Section 2.4 of the original Cabal specification (that is, via a ‘Setup’ executable), performs poorly for ‘everyday users’ with component-based builds. I am wondering if the ability of Cabal (the tool) in avoiding the same ‘performance’ problem is something to do with the relationship between GHC and Cabal (the tool).

5 Likes

@simonpj Thanks for opening this thread!

I believe the relationship between the cabal project and GHC is underdiscussed but also critical to the development of the ecosystem. The issues at play are not only technical (those are fun to solve!) but also organisational, and their presence has put stress on the people involved on both either project.

Allow me an academic joke. This is how physicist David Goodstein starts his book “States of Matter”:

Ludwig Boltzmann, who spent much of his life studying statistical mechanics, died in 1906, by his own hand. Paul Ehrenfest, carrying on his work, died similarly in 1933. Now it is our turn to study statistical mechanics. Perhaps it will be wise to approach the subject cautiously.

We do not want cabal to share the same fame.

Note: I was going to say that I would contribute to the document and leave here only some general comments. Then I accidentally poured what I know in a post so big I will have to split in two parts. A “big picture” is hard to paint. I still plan to contribute parts of this post to the document.

The original proposal is still a good starting point to understand the origin of the many things we call “cabal”.

  1. “Cabal” is a format to distribute Haskell source code. Which is a tarball with a “package description” file containing metadata, commonly known as a “cabal file”.
  2. “Cabal” is a specification for an common interface to build and distribute source packages. This is the ./Setup.hs command line that most of our tooling use. Note the word “common” here refers to different Haskell compilers but also different build-systems: people used to write Makefiles to build Haskell and that was something that the proposal intended to support.
  3. Sanctioned in the proposal, “Cabal” is a Haskell library to implement such interface (i.e. to implement ./Setup.hs).

Note: Following RFC: Replacing the Cabal Custom build-type by adamgundry · Pull Request #60 · haskellfoundation/tech-proposals · GitHub, there is work-in-progress to revist the proposal and abbandong the ./Setup.hs interface.

The proposal describes some standardised way to implement the interface that the Cabal library will support. A “simple” implementation (that build-type: Simple that you find in most cabal files) would perform the build according to a declarative specification entirely contained in the cabal file. Alternatively, build-type: Make would delegate the build to a Makefile. On the other hand, build-type: Custom would indicate a custom implementation of the interface, which might not even use the Cabal library. It is indeed possible to write a compliant Cabal package without relying on the Cabal library.

The proposal also specifies what a compiler has to implement to support all this. This includes: the ghc-pkg tool, the concept of a package database (packagedb) and the -package compiler flag. Running runhaskell ./Setup.hs install will install the compiled files somewhere (by default /usr/local on unix systems) and compose a InstalledPackageInfo file which ghc-pkg would register as an entry into a packagedb (by default the global package database). The type InstalledPackageInfo, defined in Cabal-syntax, is a concept from Cabal-the-specification; and is used by GHC to provide an implementation for ghc-pkg. All this works the same today.

The mearning of the words “package” has changed since we started having multiple “components” in a cabal package. I am not sure how and when the following changes appeared, or even whether they were distinct events, but I believe it is still conceptually valid to describe them as separate steps.

  • Adding executables was somewhat a trivial change, except for the fact that executables do not need to be registered in the package database, since the compiler is never going to load them again. Notice how this starts to crack the role of ghc-pkg as a package manager: an installed executable will not be recorded anywhere.
  • Then we had multiple libraries in a package, first only private (only visible inside the package itself) then also public. Cabal uses the term “components” for the set of libraries, executables, tests and benchmarks. The existence of multiple libraries breaks the correspondence between packagedb entries and cabal packages because now one package can produce multiple “units” (I think from “compilation unit”) for GHC to load. Therefore, the entries in the packagedb are now units, not packages.
  • Up to here a package has multiple components and each component corresponds to up to one unit in the packagedb; but then we had backpack! I admit I am very ignorant of the topic but I understand that, with backpack, GHC produces “open units” which have import statements what can be resolved later on. This means that a single component can correspond to multiple units when instantiated in different way[^1].

My tl;dr is packages and components are concept relative to how source code is distributed; while all GHC ever loads are units.

I believe this significant shift in the meaning of “package” left packagedb in a tough spot: the correspondence between packagedb entries and cabal packages is gone; most of the metadata in its entries is irrelevant to GHC (I claim), while Cabal needs to have a place where to store and remember more information about how things where built. It is also true that Cabal is only one build-system, while packagedb is part of a common architecture, so it does not seem appropriate to use it for Cabal specific purposes. The situation is confusing and has been causing real problems, e.g. the issues with text-2.0 on Windows. I tried to summarise the situation around InstalledUnitInfo here InstalledPackageInfo mega issue · Issue #8967 · haskell/cabal · GitHub. The issue linked by @catachresthetic is also an example of this.

[^1]: To see an example, clone this repository from GitHub, run cabal build all and inspect the in-place package database dist-newstyle/packagedb.

9 Likes

In much the same way as:

did, perhaps a literate Haskell script or some other minimal executable or checkable specification can help here to reduce the appearances of ambiguities. If that works, then it could be used as a advisory document to better ascertain what in Cabal can be improved.

I believe the meaning of the word “install” has also changed over time.

Before we get to that we need to introduce another “cabal”. The proposal defines packages and how to build them but not exactly how to distribute them. This is where Hackage and the cabal command line tool come in. To avoid confusion, I always refer to it as cabal-install from the name of its package.

Originally, cabal-install used to mostly wrap the Setup.hs interface but added new powers to its install command:

  1. Running cabal install pkg-name would automatically fetch, build and install pkg-name from Hackage.
  2. When called in a package directory, cabal install would automatically fetch the required dependencies from Hackage, install them into the user package-db (the default is different from Setup.hs) and then build and install the package in the current directory.

Satisfying dependencies was tricky so cabal-install introduced a constraint solver to find a set of compatible packages to use. The solver also consults the package database and prefers to reuse already installed packages (the dependency solving happens at the package level) when allowed by the constraints. This seems a strange default today, but I assume it was to avoid recompiling and/or to avoid “changing” the packagedb when not strictly necessary.

I think these were the days of the “cabal hell”.

The imperative way to manipulate the packagedb would lead to disasters like this:

  • Install pkg-b that depends on pkg-a.
  • Many days later, you install pkg-c that depends on a newer version of pkg-a (without realising that this breaks pkg-b).
  • Next time you try to use pkg-b, you realise it does not work anymore and, in a rush to get things done, you reinstall it (without realising this will break pkg-c).

Moreover, using Haskell for application development was coming into the picture and cabal-install could not provide any way to assist with reproducibility since its dependency solving would rely on a ever-changing Hackage index. It would also not have any support for building multi-package projects, or control precisely how packages are built (cabal install would go all automatic and cabal build would not do any dependency resolution).

This is where stack came into play, bringing in a maintained set of compatible packages and project-level reproduciblity.

To catch up solve these issues, cabal-install introduced a new set of commands (the v2- commands, the previous behaviour prefixed with v1-) that would perform “nix-style” builds:

  1. (Almost) any building always happen in a project context defined in a cabal.project file (in its absence, cabal-install will use a trivial one), with no shared state between projects.
  2. All dependencies are isolated through the use of hashes like nix does. The id of an entry in the packagedb will include the hash of its build configuration, which will includes the hashes of its dependencies and so on.
  3. Given the mechanism above, already built dependencies can safely shared between projects when their hash is equal. This is implemented through a special package database called “the store”.
  4. A field in cabal.project can be used to fix the state of Hackage’s index (index-state:), so the set of dependencies you get today is the same as the set of dependencies you will get next month.

Nowadays, v2 commands are the default and you need to use the v1- prefix to get the previous behaviour.

The transition between v1 and v2 commands took a long time and, to many, it is not over at all. The change has introduced subtle changes in behaviour that v1- users would not expect. I belive most of them are related to the meaning of “install”.

Here are few examples:

Agda

The focus on reproducible project-based builds means that if you do cabal install inside Agda's source tree; cabal-install will

  1. make a source distribution
  2. unpack it in a temporary directory
  3. build it and install it from there.

This is to guaratee the result is the same as if the code would come from Hackage; but has the side effect of recompiling everything from scratch every time.

Installing libraries

How do we install a library? I believe the best conceptual model is to think that cabal v2 commands do not install anything, ever. Indeed cabal-install does not mutate the global or user package database. If you do cabal install free, cabal-install will warn you

The command “cabal install [TARGETS]” doesn’t expose libraries.

While, if you do cabal install --lib free, you get told

Warning: The libraries were installed by creating a global GHC environment file at:
/home/andrea/.ghc/x86_64-linux-9.8.2/environments/default

An environment file is not a packagedb! but a file that GHC can use as a replacement for a sequence of package related options.

λ cat /home/andrea/.ghc/x86_64-linux-9.8.2/environments/default
clear-package-db
global-package-db
package-db /home/andrea/.local/state/cabal/store/ghc-9.8.2-2c96/package.db
package-id base-4.19.1.0-cbb2
package-id free-5.2-fba9dd97ec45ac275c816eab9abe287a13a9b6edbd30126b68b69dd9a88f8af9

This shows what cabal-install actually did: it “installed” free into its own private packagedb and then wrote down some instructions for GHC to find it. While it seem the same this avoid the “reinstall” problems of v1 commands, while still sharing dependencies and avoiding recompilation.

(TBH I am not sure how install --lib behaves, does it read the environment file to make a plan?)

Note that, despite being present in the store packagedb, projects will not see the “installed” library because their build environment is always defined by their respective cabal.project files. If the project includes a package which has free as a dependency and, after dependency solving, the free package hash in the build plan matches the one above; only then that package will see that unit from the store.

I better stop writing now but ask aways if something is not clear. Either I can answer or I can find where to look at.

3 Likes

I’m not sure if you’re saying that’s how cabal-install currently behaves, but if so then it definitely doesn’t seem to! After a cabal update I expect to have to recompile most of my dependencies for any local package I’m building. I wish it had the behaviour you describe instead, although I was informed that the “prefer newest” behaviour has the benefit of coming up with a deterministic install plan once you know the index-state.

1 Like

While I had no intention to be sneaky, I admit I should have seen this confusion coming.

The solver is actually shared between the v1 and v2 commands, and only configured differently. In particular the v1 commands will pass the global and user packagedb to the solver while the v2 commands will only ever pass the global package db.

But there is another confusion: with v2 commands, your dependencies are not going to be installed in the global or user package db. They are “installed” in the store packagedb, which is never part of any constraint solving process.

This is a subtle difference and I found it difficult to explain properly but I can try once more:

The solver takes a list of source packages (the Hackage index), a list of installed packages (the global packagedb) and a list of targets and finds a coherent set of packages that include the targets. After this has happened, this set of packages gets elaborated a bit to accomodate backpack and per-component buildings and a precise hash for each of the to-be-built units is calculated. Only then the store is consulted to check if any of those units have already been compiled (and in the same way) by virtue of their hashes.

The store packagedb is implemented as a packagedb but it is not used as a regular packagedb.

3 Likes

After a cabal update I expect to have to recompile most of my dependencies for any local package I’m building. I wish it had the behaviour you describe instead, although I was informed that the “prefer newest” behaviour has the benefit of coming up with a deterministic install plan once you know the index-state.

I am not sure if I understand you correctly here. Without a index-state specified in your cabal.project (or on the command line), the dependencies will be solved against the newly updated index. This likely will bring new versions into the plan which will have to be compiled (as well as whatever depends on them).

The build plan is completely deterministic (it’s a pure function) given its inputs. They are a few but not a big number:

  • compiler
  • operating system
  • architecture
  • the list of available source packages (i.e. repositories and local packages)
  • the packages available in a given stack of packagedbs.
  • the list of pkg-config entries in the system
  • settings like allow-newer or allow-older (which are actually implemented by modifying the available source packages, erasing the bounds)
  • user specified constraints
  • a handful of solver parameters which are rarely used, with the exception of prefer-oldest which is a recent addition.
4 Likes

(Apologies in advance for a very dumb question from a dumb user.)

I at first skimmed this thread expecting it was nothing to do with me. I’m like

My use case is I’m a hobbyist who writes small programs.
[George C on the ghc-dev thread]

Furthermore I’m a Windows user, so I’m allergic to command lines.

I’ve been using Haskell more than a dozen years. I hear ‘Cabal’ only quickly followed by ‘hell’ [**]. I’ve been feeling increasingly guilty that GHC’s front page says

  • GHC is a breeze to install using ghcup or Stack [***]

And yet if I go to either of those links, I’m confronted by command-line gobbledegook. Much easier to follow the download link at the top of that page, download and unpack the .tar using familiar point-and-click as with other programmer environments I use as a hobbyist. So to Tom/Simon’s driving question:

The Driving Question of this document is this:

  • When GHC sees import Boo.String, how does it find the correct module Boo.String to import?

I don’t know/do I need to care? It seems to just work. What about cabal makes that even a question worth asking?

So long and short: I’ve never installed cabal/stack. I was hoping somewhere in the tangle of documentation I’d find advice on why/why not to go to the bother of cabal. I don’t maintain packages; I don’t try to furiously keep up with GHC releases; I like to keep it simple because my interest is in a practical lambda-calculus; not maintaining some complex development infrastructure.

[**] Yes all the google results for ‘cabal hell’ are old. Except recent links (after saying that) then say “except …” there’s still traps for newbies … dependencies … outdated packages … memory usage …

[***] This intro to Cabal says ghcup is for Linux, then links to the Haskell Platform (presumably for Windows) except that’s deprecated.

2 Likes

Those instructions are indeed outdated. They were already fixed in Update install instructions on landing page by BinderDavid · Pull Request #38 · haskell/cabal-website · GitHub , but it seems that the changes have not been published automatically.

2 Likes