[RFC] FFI libraries

Preamble

Haskell’s FFI is rather mission-critical for writing real-world applications. While GHC support for it is fine1, on the community level there is both little guidance on the matter and some rather daunting tooling problems. The result is a fractured ecosystem where every person has a different approach to FFI and it is far easier to hardcode a C library into your project than it is to make bindings to one.


1. Other than perhaps the foreign import unsafe qualifier. The person using the library typically knows better whether they want the safe or unsafe variant.

Definitions

A foreign library is a library written in a programming language that isn’t Haskell.

An FFI library is a Haskell library containing the definitions required to interface with a foreign library.

FFI libraries

At the centre of the proposal lies a concept of an FFI library, which I believe to be sufficiently different from any other kind of a Haskell library to warrant both a separate name and broad ecosystem support.

An FFI library contains an unsafe, loosely typed, minimal set of definitions required to interface with a foreign library. The names for the definitions should match the respective ones in the foreign library as close as possible to avoid confusion and simplify search. The definitions generally fall into the following types:

  • Function imports for functions that can be marshalled out of the box.

    safe qualifier should be used unless the function is unsafe-only (e.g. specialized math algorithms). unsafe qualifier, if needed, should be exported in a copy of a function with _unsafe appended to it (or Unsafe if the function name is in camel case).

  • Types for functions that cannot be marshalled.

    While useless for writing code, these will be useful in documentation. Higher-level libraries are expected to create their own foreign language functions to import such definitions.

  • Types for function callbacks.

    Some of these will also be unmarshallable, ditto the previous point.

  • Type synonyms.

    newtypes should be discouraged, as they tend to clutter code more than help with safety.

  • Constants.

    Integer constants can be embedded nicely with PatternSynonyms (see gl library), strings can be embedded into pointers with MagicHash (GHC.Ptr.Ptr "honk"#). Constants should be typed as loosely as the type system allows since foreign libraries may reuse constant names in different contexts.

  • Information for accessing structured data, arranged around an uninhabited type.

    For C structs specifically this is the total size of the struct, it’s alignment and the offset of every single field. Note that Storable is not a good abstraction here and this will thus require extra type classes in a separate library.

The rules are not set in stone and libraries should be able to break them, the spirit of the approach is far more important here.

Curiously enough this means an FFI library is also -XSafe, perhaps with the exception of foreign imports that return pure types (if those are even useful, i.e. don’t behave exactly like unsafePerformIO f).

Cabal

.cabal files have a rather curious problem: they don’t properly support foreign libraries. Foreign libraries are not the same as custom foreign code: the user should be tasked with installing a foreign library, not the FFI library maintainer.

See [RFC] Separate linking for an attempt to spec it out from my side.

In short, the goal is to have foreign libraries as a distinct class, which Haskell packages can then depend on alongside regular Haskell packages (and to have dependency trees in a similar way). The user would then be able to specify how to find the installed foreign libraries and their headers in their cabal.project.

The definition of an FFI library itself in a .cabal file should be mostly cosmetic, as they build and run same as any other Haskell library. Extra metadata fields would make sense for interoperability with other tools.

Haddock

No need for an overhaul here, merely a few custom metadata tags would allow alternative representations for FFI definitions. Structured data, while merely an empty datatype with a bunch of instances, could instead be displayed in proper language form.

Hackage

It would be nice for Hackage to take foreign library dependencies from Cabal files and display them on package webpages. That way users would know the package will not work out of the box unless all the foreign libraries are present.

Note that foreign library dependencies are not the same as regular ones: version ranges here are merely for user convenience and transitive dependencies should propagate.

I don’t know whether FFI libraries warrant a separate namespace, though if any foreign library is expected to have only one FFI library to it, this would be expected. Aforementioned extra Cabal metadata could be used to display the type of the library and to stylize the webpage accordingly if needed.

Automation

As I explained in the precursor comment, I do not believe in automatic interface generation, however a custom templating engine would make total sense for FFI libraries. I will not bother to speculate as to how it should look until the proposal is agreed upon and implemented.

13 Likes

What about pkgconfig-depends?

How is that different from the existing foreign function interface options?

1 Like

Hmm, I guess the wanted feature set can right now be roughly replicated with extra-include-dirs and extra-lib-dirs , so I guess the supposed correct solution in .cabal right now is either providing library names in extra-libraries or not providing anything at all? (the doc on extra-libraries says “when not linking fully static executables”, so I’m still confused).

This does indeed counter my point on there “not being support for foreign libraries”, though I do have to say I’ve only been figuring out FFI libraries for three years and this is the place I learn it from.

In my mind there’s a clear line between the .cabal manifest (which is a package declaration) and cabal.project (which is user’s setup of the project). The user should be able to choose to use pkg-config and as such invoking it should be done in cabal.project.

Also note that at the time of writing the topic I was not aware of extra-libraries, so in my mind the only way to link a custom-built library statically was to hack around the pkg-config environment.

1 Like

I’ve had success linking to a custom version of Botan built in a non-standard location $BOTAN_OUT using cabal build thing --extra-include-dirs $BOTAN_OUT/include --extra-lib-dirs $BOTAN_OUT/lib. You can avoid the burden of flags by setting their respective stanzas in a cabal.project.local file.

6 Likes

I’ve spent a fair bit of time binding C libraries into Hs (or trying to at least, eg with PETSC, DuckDB ) using C2Hs, inline-c . Inline-c was super promising but it stopped working at some point, I don’t know why. Anyway I’d love if there were a single known good way of doing FFI bindings.

2 Likes

Upon further investigation it turned out I just can’t link static libraries at all out of the box, linker errors out with undefined references.

extra-libraries does indeed work as advertised: GHC picks the first library it finds, both static and dynamic libraries qualify. I thus assume “when not linking fully static executables” simply means “every foreign dependency not provided through source files is treated as a dynamic one”.

pkgconfig-depends is not a user’s setup of a project, and does not prevent a user from choosing to use pkgconfig or not (which can be done by enforcing a dummy pkgconfig executable for now, but can be improved in the future – there’s a ticket on the cabal tracker about this if I recall). It is exactly a package declaration that the package can be built if a given library can be found in pkgconfig.

2 Likes

pkgconfig being a first-class citizen in cabal-install is a nice, lightweight feature imo. It’s a widespread tool and it’s pretty easy to use. And tools like Nix can use it to auto-wire C libraries nicely. I prefer pkgconfig to the normal way libraries are installed in a build system tbh.

1 Like

There are three separate parts to this:

  • A foreign library may have a pkg-config file and it should be possible to specify the name of that file in a .cabal file;

  • Cabal, when building the library, should be able to use the aforementioned file name to invoke pkg-config;

  • The user should be able to choose how a given foreign library is included and linked on their system.

I’m not proposing any behavior changes in regards to the first two (pkg-config is indeed a solid default choice), my issue is that I should be able to say both "foreign library’s pkg-config name is freetype2" and “for my project I have a libfreetype.a here, it’s static”.

2 Likes

cabal’s usage of pkg-config asks for both dynamic and static libraries, and it will link one or the other (if both are available) depending on if the cabal build in general is a dynamic or static build, which seems appropriate to me.

there’s a ticket discussing this behavior and how to make it slightly more tuneable for broken pkg-config dbs here Cabal 3.8 expects `pkg-config --libs --static` to always work. · Issue #8455 · haskell/cabal · GitHub

also, there’s a way to both specify a pkg-config name and also a specific library name – which is that pkgconfig-depends gets guarded behind an auto flag, and if the lookup fails, it falls back, flips the flag, and then uses a direct extra-* set instead, whose lookup locations can be set in a cabal project.

i can imagine more granular options to cabal as well – but in general i’d recommend we try to make cabal more configurable on the command line or in the project file rather than first looking to change the cabal file grammar itself.

2 Likes

Turned out FreeType2 by default dynamically links to three other libraries and those were the missing references.

As such the minimal portable solution as of today is the extra-libraries field in the .cabal file, and extra-include-dirs plus extra-lib-dirs in the cabal.project file (note that the order of library directories matter, see documentation for GHC’s -l flag)

I’m interested in your Haskell bindings for DuckDB - are they available on GitHub?

No, sadly they never got to a usable point via the inline-c route. I was counting on the C API being fairly close to that of sqlite, so my plan was to replicate e.g direct-sqlite but via inline-c. Perhaps hand-rolling the C bindings will be more effective.

3 Likes