(pre proposal) cabal exact print

I’ve written about cabal exact print,
I’d love to have some feedback on this:


I’ll also work on this at zurich hack :slight_smile:


Or they revolved around creating a seperate AST3, which was against maintainer recommendation, and then abandoned.

It’s per se “abandoned”, it’s just that I received zero feedback on any of the decisions I made (I was working on this for two months and a half a year extra passed since). Maintainer recommendation is merely “we don’t want rogue actors rewriting half the codebase on their own”, which is fair maintenance-wise, but doesn’t advance the discussion meaningfully.

I still stand by the idea that a generalized Cabal format is better, but it’s probably too much work to make this happen (even the fact that the only parsers allowed, binary and parsec, both suck for this).


thank you for attacking this! this really needs to happen! <3

I wonder if anyone has asked @alanz about this, since he did most of the work in making this possible in GHC.

There was some discussion on the GH issue when it originally started, IIRC, but it was not particularly fruitful.

I will also be in Zurich, so maybe we can talk more there.

@sclv followed the process for a cabal exact printer, I am sure he will have insight about the possible solutions.

1 Like

I recall liking the draft pr, and I think a tech proposal is a good way to force yourself to outline the gameplan in a way that others can participate as a group, in order to spread the work and make it sustainable. Thanks for keeping at it!


5 posts were split to a new topic: Module auto-discovery in Cabal

Thank you for writing this up @Jappie !

I have spent a lot of time thinking about this while also refraining from making any proposal or announcement until I felt more confident. Nevertheless I hope you won’t mind if I jump behind your lead :joy:.

Being familiar with cabal’s codebase (as much as one can be), I think that starting from GenericPackageDescription is not the best approach. Put simply, many types used in the codebase are strecthed between different domains and end up being not quite right for any of them. GenericPackageDescription is a good example: we parse right into it but then we also use it for all the operations related to packages metadata. On one side it loses relevant aspects of the parsing domain, on the other it carries information that might either be irrelevant or not in the right form. Additionally, the way GenericPackageDescription is constructed during parsing is very imperative and complicated.

Of course all this can be fixed but I have a smaller first step in mind: decoupling the syntax from what we do with it. The syntactic structure of “cabal-like” files (package description files but also cabal project files) is very simple and already decently captured by the Field type.

Using Field to capture whitespace and comments is simple and also nothing new: many have already done it with different hacks[^1] and it is something the Cabal-syntax could do itself. We ““only”” have to modify the fields parser to keep whitespace and comments. I was able to change the lexer but I got a bit stuck understanding how the parsing was done (and very more stuck thinking how I would re-write it from scratch :joy: :man_facepalming:).

Once that is done, building combinators to operate on Field is very simple. I have written a simple PoC using optics and a couple of hand-written traversals; in it I have written a prism between ByteString and a slightly modified Field. This is enough to operate on any field in its textual form without affecting anything else (e.g. change the version number, change the source directory of a component with a given name).

Now, this is only a syntactical manipulation (the content of all fields is textual) and there is a bit more work to figure out[^2] but I am optimistic about it: it is a relatively small, non-necessarily-breaking change and it already supports basic uses cases like bumping version numbers.

I tried to push this approach further by building prisms out of the Pretty and Parsec instances and trying to change the bound on one of the dependencies. The good thing is that it actually worked! The bad thing is that by using Pretty and Parsec I had thrown away the concrete syntax of the dependency (all of them or only the one I changed? I don’t remember). The way those fields are parsed is not bad (it is based on FieldGrammar) and I think it is possible to adapt it to support our usecase.

If you are interested to hack on this stuff at ZuriHac, I will be there with bells on! In the mean time, happy to chat more if you like.

[^1]: by using the position information to recover the text between the parsed parts.

[^2]: off the top of my head, I think capturing the text between the parsed content is not enough to do things like inserting new fields or sections. We should need to find a representation of the intentation structure. The lexer knows this, so there’s hope.


If I remember correctly you could make it even simpler than that: merely remove the ability to escape section names and outlaw curly-bracket syntax (replacing it with something like Bird-syntax, but for indentation), and then the entire format can be sliced up by simply reading the whitespace before the lines. Both of these format features are used extremely rarely and are far more trouble than good parsing-wise.

While you could, my goal is to do this without changing the syntax of cabal files :slight_smile:


There’s quite a difference between syntax as in “what people use” and syntax as in “how the parser currently behaves”. I don’t think either of the two features above are documented and the number of Hackage packages that use them is most probably in double digits.

Cabal already has a whole separate module for file patches, so this isn’t breaking any new ground.

Though yes, not that you explicitly should deviate from the existing format, just that the goal is to support the current userbase, not maintain hacky sidecases implemented 15 years ago.