Monorepo: Cabal internal libraries vs multi-package Cabal

What’s your experience when setting up a monorepo with several executables and several shared libraries?

  1. Do you have one top-level .cabal file that declares those executables and libraries (and their inter-dependencies)?
  2. Or do you split up bigger parts into separate Cabal packages (each with their own .cabal file) that you just build together via a top-level cabal.projects file?

edit: What would you recommend when there’s no plan to distribute separate packages? I understand a Cabal package as a unit of distribution.

4 Likes

Isn’t a monorepo with one package is just a… “repo”? (=

I think this option benefits from parallel compilation via cabal build -j. But I don’t know it the first option also runs in parallel multiple components

The biggest project by this measure that I ever worked on has 15 components in one Cabal package — 3 test suites, 5 libraries and 7 executables. It works well. Cabal knows how to build any given component efficiently.

Be sure to put each component into its own folder, though — I recall there are some issues with Cabal if components overlap in the file system. I have even written a special tool that handles the folder structure for me.

I write a cabal.project when I need to work on a dependency from Hackage or an experimental dependency from some other repository. Say I am working on a package x that depends on y, and I need to make some quick adjustments to y. I should clone y somewhere on my computer and put the path to it into the cabal.project located in the top level folder of x. This means that I never have to leave the top level folder of x — I find this to be ergonomic.

Overall, I see cabal.project as a local and temporary fixture and a Cabal pact as an eternal binding agreement.

7 Likes

THIS. Having overlapping components is a road highway to hell.

5 Likes

Unfortuneately, the docs don’t have enough detail. Reading Reducing Haskell parallel build times using semaphores - Well-Typed: The Haskell Consultants suggests that the parallelization is at Cabal component level.

I see and thanks for the advice. This sounds like a use case that should be mentioned in the Cabal documentation. I added it to a list of improvements at [Initiative] Improve Cabal documentation structure to become more beginner-friendly · Issue #9214 · haskell/cabal · GitHub

edit: For a monorepo setup (several subprojects/microservices inside a single repository) it looks like multi-package Cabal is better, because static file paths to include into a microservice can be relative to the subproject, whereas with a single top-level *.cabal file, all relative paths need to be relative to the repository root.

1 Like

If I undestand correctly, The well-type article says that ghc builds in parallel at module level (which is similar to component level) whereas cabal builds in parallel at package level (It isn’t clear from cabal's documentation if thats the case).

I would hope that this -j flag is passed then to ghc so that modules are build in parallel when there’s only one package. I added this my list of things to document for Cabal.

I’m facing this right now, exporting a mono-repo project to cabal. I think cabal is just not designed for multiple libraries / executables, so there is no satisfactory solution as long as cabal is involved (neither my personal nor work projects use cabal), but I chose 1, with the caveat that they still all have to live in independent root directories or cabal will try to redundantly compile even undeclared and un-imported modules. The reason is I didn’t want to have to deal with all the redundant names and versions. However, I’m beginning to have second thoughts, because given that generating cabal files is inevitable and everyone living in their own directory required anyway, it’s not much extra work for the generator to make up versions and set them all equal. And 2 is definitely better supported and with a longer history, while 1 is still considered new and likely buggy.

If I do wind up with 2, I’ll just have to document that these are not independent packages, and it’s your own problem if you try to treat them that way (in other words, version will always require exact matches). That’s nonstandard so it’s motivation to stick with 1 unless forced off by bugs.

To answer one question of the above: cabal builds components (libs, exes, tests and benchmarks in the same .cabal file) in parallel and even builds their separate dependencies in parallel with other components proper (not only with their dependencies). I know this for a fact, because I see this a lot when building my packages. Documentation improvements are welcome (and thank you very much for [Initiative] Improve Cabal documentation structure to become more beginner-friendly · Issue #9214 · haskell/cabal · GitHub).

3 Likes

I’m so thankful that I stumbled on this post! I don’t think this key piece of information is in the cabal documentation regarding internal libraries. As a Haskell beginner, I have spent days trying to figure out how to make a project with multiple internal libraries and executables. If it helps with the documentation improvement effort, here is my journey:

  • I wanted to keep my folder structure very simple with just an app folder and a src folder. Coming from Python, I assumed I could import modules in any other module without any conflicts.

  • After messing with the cabal file for a while, I realized I could list both the app and src folders under the hs-source-dirs of each executable. This worked well for a while and I thought I was done.

  • However, as my project grew, HLS in VSCode started complaining about missing imports, even though, as far as I could tell, the modules were in the right place. And what’s more, cabal was compiling my project just fine so I thought it was an HLS issue.

  • When I realized it wasn’t an HLS issue, I still wasn’t sure how to get HLS to behave and I was about to try the nuclear option: a cabal.project file with separate packages with their own cabal file for each of my modules!

  • But that’s crazy, I thought. Surely there must be a simpler way. Eventually, I found out about internal libraries, which surprised me because the cabal docs say that a package can contain “at most one library.”

  • So I defined separate internal libraries in my cabal file, with all my lib modules still residing in a flat src folder. And then (drum roll) cabal couldn’t compile my project.

  • Bewildered, I finally found this page and after moving my modules into their own subfolders everything works as expected (and will hopefully continue working).

For reference, I’ve attached a module dependency diagram of my project.

5 Likes

I don’t use internal libraries since they don’t solve any real problem and just make tooling more complicated and add another way of doing the same thing.

  1. always expose all your internals, possibly through a foo-internals package, also see Internal convention is a mistake – Functional programming debugs you
  2. create separate packages when possible, e.g. compare with the hasql ecosystem
  3. if you re-use large chunks of cabal file code, generate them with e.g. dhall
3 Likes

FWIW, after discussing the issue with @hasufell here on Discourse, this approach is the one I chose to take for Bluefin. Here’s the repo, for anyone who wants to see the exact approach that I adopted: GitHub - tomjaguarpaw/bluefin

3 Likes

Multiple packages vs multiple libraries in one package is really a question of whether you want to version them separately. Fine-grained versioning is nice in some situations but not always needed. For example, in the common situation where you have a library X and other libraries X-Y for using X with Y, it’s usually not a big deal if the two get versioned together.

2 Likes

I agree with most of that blog post, but I feel like there’s one extra axis that could be covered.

As a library author I wish to have:

  • A clean interface that guarantees safety regardless of input quality. For the core library I prefer foo, for a cross-dependency with bar I call it foo-bar.

  • A dirty interface for datatype internals, basic helper functions and unsafe variants of functions that have undefined behaviors on certain inputs. This should be as well-documented as the clean interface and I’d call it foo-unsafe.

  • An internal interface that is used to construct the previous two interfaces and is as such a box of three hundred undocumented definitions of no interest to anyone but me. This would be foo-internal and I don’t want anyone but me to be able to import modules from this.

All three should have different PVP versions. Though I guess for the internal interface the PVP does not matter.

I assume this would be some variation of a multi-package setup, since modules would be shared between the interfaces.

1 Like

I’m quite curious why you want foo-unsafe and foo-internal to be separate libraries. I don’t know much about your foo, but if I was a user then it’s quite plausible that I would want to import things from foo-internal! That’s definitely happened to me several times in the past, for different foos.

Then those definitions either belong to foo-unsafe or are effectively already in foo-unsafe as some combination of existing definitions.

foo-internal in my mind does a major version bump on every little alteration, so depending on it makes little sense for an external user, they might as well just copy-paste the function definitions. Also in regards to just having things that work, foo and foo-unsafe should be more than enough to get something working for the time being, even if it runs suboptimally.

1 Like