How to give libraries optional dependencies?

qqwy · June 17, 2024, 5:03pm

Hi everyone! I’m trying to do something in Haskell which is very simple to do in Rust: Write a library with optional dependencies.

To give you an example, say we want to write a new library called length, whose main feature is a typeclass:

class Length a where
  length :: a -> Word

By default, it ships with an implementation for List a, NonEmpty a, ZipList a, CallStack and ByteArray, all collection types which exist in base.

But there are other commonly-used packages containing datatypes that merit a typeclass instance:

array's Array and IArray
containers’ Map, Set, IntMap, IntSet, Sequence (strict and lazy)
unordered-container's HashMap, HashSet (strict and lazy)
vector's various Vector types
bytestring's ByteString and ShortByteString (strict and lazy)
text's Text and ShortText (strict and lazy)
etc.

Without optional dependencies, the writer of the length library has one of two options:

Unconditionally depend on all these libraries, making the length library very heavy-weight for anyone, even when they only use a small subset of the datatypes.
Depend on none of the dependencies, and creating a family of orphan-instance libraries (length-array, length-containers, length-vector, etc.), asking the consumer to manually install the extra orphans they need.

What I would like to do instead, is to indicate in my cabal file, which dependencies are optional. Then, iff another package (transitively) depends on both me and that other dependency, does the desired functionality get enabled.

Is there a way to do this?

qqwy · June 17, 2024, 5:13pm

It seems that using automatic Cabal flags this is almost possible. As per the example in the Cabal configurations section:



Flag NewDirectory
  description: Whether to build against @directory >= 1.2@
  -- This is an automatic flag which the solver will
  -- assign automatically while searching for a solution

[...]

    if flag(NewDirectory)
        build-depends: directory >= 1.2 && < 1.4
        Build-Depends: time >= 1.0 && < 1.9
    else
        build-depends: directory == 1.1.*
        Build-Depends: old-time >= 1.0 && < 1.2

which will automatically set the NewDirectory flag to True or False based on which version of the directory dependency unifies with the other dependencies of the project that uses the library.

The one thing that would be required to build optional dependency support on top of this, is to have a syntax build-depends: !directory which means: Only succeed if no version of dependency is in scope. (Note that this is not the same as a bound like build-depends: directory < 0 since that would always fail).

jaror · June 17, 2024, 5:27pm

What are the downsides of your second option:

Nowadays you can even put them in public sublibraries, I believe.

Optional dependencies are in my opinion a vestige from the old ./configure days.

BurningWitness · June 17, 2024, 6:03pm

There is no good way to do what you want, but it’s more complex than you think.

Let’s forget that type classes exist for a brief moment, you only have datatypes and functions. Providing an optional dependency is quite easy: you have a core library named foo, for everything involving bar you have a library in the same repository named foo-bar. Most libraries follow this path and it works quite well, see e.g. servant.

Unfortunate caveat even in this scenario: there doesn’t seem to be a way to nicely package internals, so you may end up with some ugly stub like foo-core or foo-internal. Cabal’s public sublibraries are supposed to fix that, but I haven’t seen libraries use them (even truly gargantuan systems like amazonka chose to not use them in the current state), so it’s hard to say if they actually work as intended right now.

Now, type classes don’t work with this because of the orphan instances, but in practice this is only a problem if you abuse type classes. Instances are supposed to be unambiguous, that’s why there are usually laws attached, and making ambiguous instances allows errors to seep through that ambiguity. Your Length example can refer to quite a few different properties of the underlying types (number of bytes, number of items, number of characters, number of chunks), and is as such ambiguous. As @jaror said, you are free to make orphanages if you choose so, this is merely convention.

The Cabal flag idea is used by some libraries (see e.g. semigroupoids), but it’s a quite rare phenomenon.

qqwy · June 17, 2024, 7:13pm

Thank you for sharing! That talk is very interesting.

I would like to note one important distinction of optional dependencies vs the ‘wild west’ of ./configure scripts, and that is that optional dependencies, similar to typeclass instances, are expected to only ever be additive. As such, the surprising behaviour of ‘I opaquely depend on either A or B based on what happens to be installed’ does not come up.

mpilgrem · June 17, 2024, 10:01pm

Is one reason that few published libraries currently use public sublibraries that the Hackage UI for public sublibraries is still a work-in-progress?

BurningWitness · June 17, 2024, 10:34pm

That is most definitely a part of it (at the very least seems to be the reason behind amazonka never moving to it), but it’s also hard to follow anything regarding this topic in general. The original Cabal PR got merged more than five years ago, no libraries I know of use it and there is no one comprehensive document that describes the current limitations and choices.

Cabal doc doesn’t help here much either, the sublibraries example does not explicitly feature any visibility tags.

michaelpj · June 18, 2024, 9:29am

We’re using them a bit in the lsp, world, e.g. the quickcheck instances for the LSP types are in a public sublibrary. I don’t know if anyone is using those though! Especially since it’s difficult to find them on Hackage currently.

We also kind of do this with HLS: we have many components for the many different plugins you can build HLS with, and we use cabal flags to control which ones get pulled into the final executable. That’s a kind of “local” version of optional dependencies, and it works fine. The libraries aren’t public though, because they’re not of interest to anyone else, but if for some reason we wanted to split things amongst multiple packages we’d be using public sublibraries there.

qqwy · June 20, 2024, 5:50pm

Even if sublibraries were more stable than they currently are, how would they help tackling this problem?

If I understand the sparse documentation about them correctly, they make it slightly easier for the library writer to create one repository containing a ‘main’ library together with a family of ‘orphan-instance’ libraries. But for the library user, the experience would still be the manual work of collecting the needed orphan-instance libraries by hand, wouldn’t it?

jaror · June 20, 2024, 6:02pm

I’m coming around to your view. It could be useful to have some “features” to be automatically enabled whenever certain packages are included in the build plan, especially instances. However, it does present some new challenges, for example how do we make sure that the features are purely additive (cabal flags are certainly not limited to additive changes).

And I would ideally want to take that even further. Not only do we want the changes to be additive, but we would also require them to be “relevant”, i.e. the added code cannot be used unless the appropriate package is present in the build plan. For example, you can only use the Length Array instance if the array package is in the build plan. So there is no reason to use the “array feature” of the length library unless you are also using the array package. Otherwise people might be confused that certain unrelated functions are hidden behind a magic “array feature”.

Determining the relevance of instances is easy, but in general it can be quite difficult. For example if we have a function like this:

foo :: Maybe (Array Int) -> Bar

Then some users might only care about foo Nothing and for that they don’t need the array package. In the extreme, users might only care about foo undefined which is another can of worms.

I’ve read the Rust documentation and it seems like they also require manual managing of optional features, just like cabal flags. Are they really different?

Edit: As @waivio explains, the main difference between cabal’s flags and Rust/Gentoo-style optional dependencies is that cabal flags cannot set by the packages themselves. Cabal flags must generally be set by the distributor/user.

However, going back to the example in the original post: presumably the array library wouldn’t be aware of the length library and therefore couldn’t set the appropriate flag to include the array instances in the length package. So, even if we had that style of flags, we still wouldn’t be able to express what we want.

The package could expose a “batteries-included” public sublibrary that depends on all the features, but it indeed cannot dynamically choose a suitable subset.

waivio · June 20, 2024, 11:45pm

That sounds like Gentoo Linux’s feature of Use Flags. Use Flags in Gentoo Linux allow packages to enable optional support for different features. A package can turn on support for gnome and turn off support for qt if that is what it needs.
From what I understand Cabal doesn’t have the capability to tell a dependency to turn on or off a flag.
I think in Haskell it would be like having a feature in the .cabal file like:

criterion >=1.1 && <2 -mtl polysemy

to turn off mtl support and turn on polysemy support.

atravers · June 21, 2024, 3:29am

[…] allow packages to enable optional support for different features.

There it is:

I’m not sure what’s there is exactly the same…but this topic is certainly reminiscent of something from that earlier discussion.

qqwy · June 21, 2024, 2:46pm

The additive nature of the flags is an important difference between the Rust flags and the Gentoo flags:

Quoting the Rust Cargo flags section about Flag unification:

A consequence of this is that features should be additive. That is, enabling a feature should not disable functionality, and it should usually be safe to enable any combination of features. A feature should not introduce a SemVer-incompatible change.

For example, if you want to optionally support no_std environments, do not use a no_std feature. Instead, use a std feature that enables std.

This is not checked anywhere automatically AFAIK; it is a convention, but if you want your library to be used as a Rust developer, you better hold to this convention because it’s the only way for your library to get traction in the Rust ecosystem.

Yes, they are really different: When a dependency is marked as optional in Rust/Cargo, a feature flag that shares its name is automatically made available, and this flag is automatically enabled once the optional dependency happens to exist during dependency resolution of the top-level package the library (directly or transitively) is used in. (see this section for details)

Thus, if you install both the Rust library uuid and the Rust library serde, you can now serialize/deserialize UUIDs because serde is an optional dependency of uuid. And if you also depend on arbitrary you can now generate arbitrary UUIDs in property-based tests, because arbitrary is also an optional dependency of uuid.
But when you just depend on uuid, you’re not waiting for any of that extra machinery to be compiled.

Compared to Haskell, where someone can depend on uuid but has to manually add uuid-aeson to support serialization* . And if you want to generate UUIDs for property testing, you need to add the quickcheck-instances library. Good luck finding that if you don’t know where to look beforehand!

*_{I guess that for this particular example, desire for UUID support was so large that at some point it was built into Aeson itself, meaning that also anyone not using UUIDs pays for the UUID (specifically: uuid-types) library to be compiled, but I digress. Feel free to replace uuid with any other prevalent datastructure library, or replace aeson with any other prevalent serialization, prettyprinting, hashing, generation, etc library.}

jaror · June 21, 2024, 2:51pm

I did read that section, but they never say that features get enabled automatically when a package of the same name is available in the build plan. They only say:

the dependency can be enabled just like any feature such as --features gif (see Command-line feature options below).

Which makes it seem like features require manual activation.