Config languages (and Dhall)

I’ve been tinkering with Dhall in GHCup: ghcup-metadata/dhall at installer-dsl · haskell/ghcup-metadata · GitHub

But my feeling was it is not a good fit:

  1. no text comparison
  2. updating deeply nested Maps is very hard
  3. syntax seems very verbose (e.g. all the explicit type parameters for fold/map/combine)
  4. maintenance seems to be on life support?

GHCup metadata is essentially:

type GHCupDownloads = Map Tool ToolVersionSpec
type ToolVersionSpec = Map GHCTargetVersion VersionInfo
-- VersionInfo has a record for ArchitectureSpec
type ArchitectureSpec = Map Architecture PlatformSpec
type PlatformSpec = Map Platform PlatformVersionSpec
type PlatformVersionSpec = Map (Maybe VersionRange) DownloadInfo

But writing this map manually as a human is very tedious. So the idea is to have something more flat and then turn it back into the nested structure, which requires updating the deeply nested maps while processing the data. I was able to write Dhall code that does that, but it’s super convoluted and does not seem idiomatic.


So I wonder: are there any alternatives?

I also tried cue, but it’s bad at templating. The only other thing that caught my eye was Nickel from Tweag. But it is turing complete and at that point I could just do the whole shebang in Haskell.

I recently found KDL and really like it. It’s not a featureful language like Dhall, but it reads like a decent DSL. I wrote a Haskell parser for it: kdl-hs, and have found it really nice.

I’m not sure that’s interesting for my use case. I want:

  1. templating (which means functions and string interpolation)
  2. good handling of nested maps
  3. some form of schema validation (which is kind of implicit in Dhall)

I think Gabriella (Dhall author) created an other configuration language named Grace or something like that. It is more similar to json (actually, any json is a valid Grace program IIRC). It addresses some of the issues of Dhall,I will link later today.

That seems to be catered towards LLM prompts or something.

I don’t necessarily endorse it, but maybe Jsonnet could be a consideration. It is partially the same idea with Dhall, i.e. JSON + functions, but without the types part.

I’ve been using Nickel for those kind of problems and I can highly recommend it. I think it hits a sweet spot between human readability / ease of use, but still having powerful features like static typing, contracts, and an extensive standard library.

ah I see, you don’t want to write a config, you want to generate a config (e.g. write Dhall to output JSON to use as config).

+1 to jsonnet. You do lose out on types/schena validation, but I think it’s nice to use

Surprising. I guess the project has move into a Structured Json interfacte for LLM (which sounds cool to me), but the original idea was this:

Original from 2021, pre-date the AI-slop.

This matches my experience with Dhall, I’m afraid. Banning text comparison (IIRC to stop people writing languages within languages) and recursion (in service of Turing-incompleteness) really limits the language’s power. The type system also seems too strong (it had to give up have type inference, so you pass type parameters for basic FP idioms) and too weak (you can’t manipulate record types in the way a config language probably needs to). On top of all that, the language spec requires a parser that can keep multiple candidate parses around for quite some time, which means that a syntax error in a large document presents as unexpected "," at the other end of the file once the last candidate parse fails. Turing-incompleteness is a clever goal but presumably one could write a busy beaver that won’t run forever but will take longer than the user is prepared to wait.

This is all a big shame because a lot of the language is genuinely good. Web-based imports, the record-completion-with-defaults syntax, the enforced lack of side-effects are all great choices for a configuration language.

My general recommendation for tools is to just consume JSON and let the user choose whatever “configuration language” he prefers. Even if it’s something horrifically cursed like m4, it doesn’t become anyone else’s problem. There’s currently no clear winner in the configuration language space, and sadly there probably won’t be one unless it has a well-used Go implementation.

Why aren’t you doing that, by the way? Just curious; I don’t grasp the architecture, but maybe this question is an easy clarifier.

Why is Turing-incompleteness important to you? It means that Nickel can enter an infinite loop, yes, but that’s practically not that different from a Dhall program that takes 10^99 years, is it?

The advantage Nickel still has over Haskell is that it cannot perform IO so you don’t need to sandbox it.

I guess the idea was mainly for upstream projects (like GHC) to more conveniently maintain their own metadata, such as GHC nightlies.

Their yaml is currently generated with some sad python script and there isn’t much I can do to communicate to projects “hey, the metadata format changed, here’s how you update your scripts”. Also, yaml is super loose and the yaml/aeson parsers accept unknown keys. So your metadata might be slightly wrong (mistyped an optional field), but still parse.

Yes, they could simply use GHCup as a library, which I also do here, but my hunch is that is not very appealing, since they didn’t do that.

So the criteria is:

  • validation
  • assisting with generation of the metadata from some input
  • ergonomics (no one wants to learn a new complicated language to do this boring work)

There is a number of those languages. This list Survey of Config Languages · oils-for-unix/oils Wiki · GitHub contains quite a few, but it is not complete. For example it does not include KCL. However, I don’t have experience with any of those apart from Dhall.

In this Reddit thread one project seems to be using the Nix expression language for configuration.

One of my requirements is ergonomics, so I’m afraid anything remotely related to nix is probably not an option.

Perhaps a smaller change is to add typing to that Python script. I don’t see any type annotations, and quite a few nested dicts.

Throw some dataclasses at it, or even a pydantic for good parsing/encoding, and you’ll be in a much, much better position to communicate metadata format changes. It would really just be changing the types and running your favourite Python type checker until it stops complaining. The past couple of years have seen quite a large push for typing in Python. It’s there for the taking. The script doesn’t have to be sad.

Thank you a lot for the tip. I’ve replaced my Dockerfile’s generator suite from bunch of YAMLs + Jinja (Ginger) template + the templating executable in Haskell to a bunch of Nickel files and a trivial shell script, resulted in a way smaller footprint, instant feedback and types out of the box instead of JSON schema; all of that with a single configuration language. Couldn’t recommend it enough. Seems like the sweet spot to me as well.

Update on this: I put my money where my mouth is and made the MR: rel_eng: Give the ghcup yaml generation script some Python (and Nix) love (!15966) · Merge requests · Glasgow Haskell Compiler / GHC · GitLab

Specifically, the declared type of the Ghcup yaml schema can be found here:

This is already quite an improvement, if I say so myself. Note that I did cap my time on this, and the viArch: dict[ViArch, dict[Distro, dict[VersionStr, DlInfo]]] part can be made nicer by making more classes (instead of dictionaries).

Edit: I decided to come back and fix that nested dict.

I realize that this entire script would be superfluous if indeed replaced by e.g. nickel. I did want to demonstrate the typing capabilities of modern Python, though.

The script is just 600loc and runs through nix shebang so there is no problem with delivering dependencies Haskell. There are libraries for parsing Json and handling CMD args.