Sovereign Tech Fund invests in Cabal as critical Haskell infrastructure

f-a · October 19, 2023, 10:34pm

hasufell · October 20, 2023, 5:02am

I’m a bit confused by the title.

It sounds this is funding of WT on the Hooks build type proposal rather than general investment in Cabal development.

Vlix · October 20, 2023, 7:30am

I agree that I also read it as “Sovereign Tech Fun will be investing in Cabal from now on”, but it’s a “one time” investment, which still means STF invests in Cabal, so the title is very accurate.

adamgundry · October 20, 2023, 8:14am

The funding is for a combination of general maintenance work (e.g. so @Mikolaj is able to spend more time reviewing and landing contributions) and for work specifically targeted at eventually removing the need for build-type: Custom (because that has long been a bottleneck that restricts Cabal development). The post focuses on the latter, because we’d like to explain the context and get feedback on the design, but the former is also important.

The proposed build-type: Hooks is intended to provide a backwards compatibility story that will allow existing users of build-type: Custom to migrate smoothly to a more future-proof interface. While we believe the design we’ve come up with is solid, we’re open to discussing the proposed approach and exploring alternative solutions with other Cabal devs and the Haskell community. Indeed we would have liked to do so earlier, but a combination of a tight timescale for applying for the funding, and other aspects of the process outside our control, meant that it wasn’t easy to do so.

Thus far the STF has committed to four months of funding, at the end of which we will potentially be able to apply for four months more. Obviously we have some ideas about how additional funding could be used, but we’d welcome suggestions on other aspects of Cabal that could benefit from sustained work.

hasufell · October 20, 2023, 8:46am

Excellent!

Indeed, I think this has caused some confusion. It also poses a problem, since there’s no guarantee cabal developers will agree to the design, but may feel pressured to do something, because there’s funding and expectations behind the proposal.

I would have hoped for more Haskell Foundation involvement here.

adamgundry · October 20, 2023, 11:00am

We’ve been clear to the STF in our funding application that whatever design and implementation we come up with will need community consensus, including agreement from the Cabal developers, and that we can’t guarantee it will be accepted. Obviously we want the funding to yield useful results, but any changes will be subject to the usual review standards and processes.

If anything, the fact that we have some resources behind this work means there is more scope for us to gather requirements and produce a design that satisfies them, even if that takes more effort. The problem you mention arises also for volunteer contributions: the Cabal developers might equally feel pressured to accept poorly designed patches in the interests of keeping contributors happy.

We did discuss this with @david-christiansen, and advertise our plans to the Cabal developers, but we had to get a proposal written very quickly so there unfortunately wasn’t time for a wider discussion at that stage. As the blog post mentions, we’re planning to raise a HF tech proposal to help gather requirements from a wide range of stakeholders and ensure the final design has community consensus.

andreabedini · October 20, 2023, 11:57am

Can you rewrite the parser and deliver the exact-printer?

noinia · October 20, 2023, 6:13pm

Would it be possible to say a bit more about which (type of) packages use the Custom setup procedure, and how much we should care about them? The way all of this is phrased makes it sound like this is a lot of effort for “just a few packages that use this Custom setup” [1]. So, one could then also say “let’s just get rid of the Custom setup, and let those few packages figure out some alternative by themselves”. [2] I’m probably over-simplifying here, so I’m wondering what I’m missing.

[1] After all, if Most packages use the “Simple” setup, and the Configure based setup can handle many of the other ones, there are only “few” packages left right?

[2] Clearly this is not a very nice stance towards those “few” packages, but if there area really only a few “uninteresting” packages then it also seems somewhat senseless to spend lots of effort to keep supporting those.

Bodigrim · October 20, 2023, 7:26pm

A common use case for Custom setup is to run code generators. Cabal can handle couple of hardcoded options such as alex and happy, but anything custom requires Custom.

Liamzy · October 20, 2023, 9:33pm

Worst case, 100 hours at 150 USD an hour comes out to 15k USD (13.5k EUR, off the top of my head), 200 hours at 150 USD an hour comes out to 30k (27k EUR).

Building an exact printer is affordable, if STF wants to pay for it. You could also just, well, piggy-back off existing external efforts (watch @BurningWitness considering an external printer), then try to see if you can bill STF for cost of conversion and integration.

It’s funny, I’ve been constantly ranting about how governments should support FOSS (it’s a public good and governments should support public goods), and here’s a small effort by the German government to do so. It’s actually fairly incredible how far their 300k EUR funding can go, when you consider the estimated value of many codebases via sloccount and COCOMO.

Thanks to the German government for being farsighted enough to allocate funding, Sovereign Tech Fund for supporting Cabal, Well-Typed for pushing the application, and the Cabal maintainers for doing the work!

andreabedini · October 21, 2023, 12:28pm

Yes, that’s a common use case.
I think in many cases code generation can be done without cabal, before the source distribution rather than during build (e.g. like amazonka does).
There’s of course the chance that the code generator depends on the particular aspects of the system it runs on; but IMHO these cases often lead to trouble (e.g. cross-compilation).

Bodigrim · October 21, 2023, 12:49pm

Well, that’s similar to alex / happy: yes, you can run them manually and package a million lines of autogenerated deterministic finite automaton instead of a human-readable grammar, but that’s inconvenient and wrong in many aspects.

What is worse is that in industrial setting a code generator is often just another package in the same monorepo. With build-type: Custom it just works, cabal build happily tracks what needs to be rebuilt and what does not. Building the code generator, then invoking it by hand, then building the main package would be slow and increasingly error-prone, causing a knock-on effect on the complexity of CI / CD.

adamgundry · October 21, 2023, 8:20pm

We’ve been investigating this a bit by surveying existing code in the Stackage package set (and are interested in pointers to other packages using Custom). It would indeed be possible to get rid of Custom without offering a replacement, but while packages using it are uncommon they are not extremely rare, so it would significantly inconvenience quite a few package authors. Some use cases would effectively end up needing to implement their own ad hoc custom build system rather than being normal Cabal packages, which would be rather a hard sell.

github.com/well-typed/hooks-build-type

survey.md

main

# Stackage Setup.hs survey

Matthew Pickering, Sam Derbyshire (Well-Typed LLP)

A primary goal of the `Hooks` build-type is to replace the `Custom` build-type.
In order to inform the design of the `Hooks` build-type, we want to analyse
what people use the `Custom` build-type for.

This work has been carried out by Well-Typed LLP thanks to investment from the Sovereign Tech Fund.


# Methodology

For our survey, we took the latest stackage LTS snapshot (`LTS-21.9`), downloaded
all the packages and searched for the ones which contained `Custom` `Setup.hs` scripts.

`LTS-21.9` contains 3010 packages; we searched these packages using the following
regular expression in order to identify the packages with `build-type: Custom`.

```

This file has been truncated. show original

andreabedini · October 23, 2023, 5:32am

I don’t disagree on “inconvenient” but I am not sure whether it should be “wrong”. Haskell offers many way to implement that without code generation (I should say “source-code generation” because there’s also Template Haskell). In fact Cabal has been relying on source-code generation for years, pre-generating its lexer, and has only recently switched to calling alex during build.

What you say is fair and reasonable but I am not sure whether it is true (or when it is true). E.g. amazonka seems to rely on running the code generator manually when it is needed. Of course, there is no upper bound on a project’s complexity; but then my point-of-view is that cabal should strive to integrate well into any other build-system rather than trying to be a build-system for everything.

In any case, I think this is leading us off track; so allow me to clear my argument w.r.t to code generation (see the proposal linked above for the full discussion).

Cabal’s simple build-type already offers some code-generation features (e.g. some historical ones based on well-known programs like alex; other recently introduced to support doc testing and automatic test discovering). I don’t claim these features are perfect or enough for every use-case; but they show we can do this sort of things with a declarative interface like the simple build-type. We don’t need to invent a new build-type to achieve something that we could also achieve by improving what we already have.

Bodigrim · October 23, 2023, 7:21pm

…and that’s why all Cabal releases up to 3.10.1.0 have a lexer with out-of-bounds access (Cabal-syntax lexer: out-of-bound array access with the JS backend · Issue #8892 · haskell/cabal · GitHub). If Cabal packaged Lexer.x instead of pre-generated Lexer.hs, it would be enough to re-build after cabal update to pick up a fixed version of alex.

Cabal had a very valid reason to reduce dependency on external programs to facilitate bootstrapping, others do not have such excuse.

But I told you when: “in industrial setting”, when you control and modify both inputs to a code generator, code generator itself and prospective consumers. Open-source packages tend to keep things simpler, because they don’t have that many moving parts.

If a code generator is written in Haskell and generates Haskell code, it seems very reasonable for Cabal to deal with it without defering to any third-party tooling.

But we digress indeed.

andreabedini · October 24, 2023, 3:39am

Ouch, you are absolutely right. In fact I wasn’t aware of this problem when I removed the pre-generated Lexer.hs in favour of Lexer.x

I have appreciated the conversation nevertheless

tmcgilchrist · October 25, 2023, 9:46pm

What is worse is that in industrial setting a code generator is often just another package in the same monorepo.

Just to back this up. My main use of Custom for industrial Haskell purposes has been to run code generation for RPC (either Protobuf, gRPC or CapnProto), and embedding version and dependency information into binaries. Think git sha or versions plus the Haskell package versions used in a cli binary that gets distributed. More improvements for that workflow in Cabal itself would be extremely helpful.

atravers · October 25, 2023, 10:35pm

…to the point of Cabal’s automatic-compilation subsystem being Turing complete?

https://okmij.org/ftp/Computation/sendmail-as-turing-machine.txt

tmcgilchrist · October 25, 2023, 10:37pm

I’ll settle for just running some codegen before compile thanks

HeinrichApfelmus · October 26, 2023, 1:37pm

I can second this: At work, I’m currently working with two code generators: FineTypes, a custom IDL focusing on types, and agda2hs, a “subset of Agda” → Haskell transpiler.

While these domain-specific languages are heavily informed by and interoperable with Haskell, they are sufficiently independent to not be a good fit for Template Haskell. In fact, their design is better thanks to being independent of the language Haskell.