Pre-HFTP: Fault-tolerant GHC compiler pipeline

michaelpj · January 2, 2024, 10:40am

I have drafted a HF proposal for work on making GHC’s compiler pipeline more fault-tolerant, that is:

Capable of producing partial output if there is an error (rather than stopping immediately)
Capable of consuming partial output and still producing useful diagnostics

This would solve the problem where errors from earlier stages (such as parsing) mask errors from later stages (such as typechecking); as well as benefiting tools that use the compiler, which could then work on partially broken programs.

Full proposal here

ocramz · January 3, 2024, 5:24am

Very interesting, thank you for writing this down!
Just a couple thoughts for now:

Step 1: handle fully-recoverable parse errors

A “fully-recoverable” parse error is one where we can know (or make a very good guess) what the user intended. This usually means guessing a few missing literal tokens, such as a trailing bracket.

tup :: (Int, Int)
tup = (1, 2

The example you give is only ambiguous because of the type signature (which is often not even present).
Without such hints, it becomes a probabilistic sort of prediction, which is where corpus-based approaches (or language models as suggested by @maxigit) can help, but do we really want to 10x-50x the size of GHC to put CodeLLaMa or friends inside it?

Step 2: handle other parse errors by producing partial output

I think GLR parsers like tree-sitter do this and it would perhaps make sense to leverage them.

Step 3 etc.

Accepting distributions of tokens/types/ASTs is even harder: the work for each stage multiplies by the number of predictions, and the subsequent output in turn.

michaelpj · January 3, 2024, 12:12pm

Would there be tasks open to GHC beginner ?

To be honest, I’m unsure at this point. I also don’t know enough about GHC to say how hard this stuff is (and I’m hoping to get feedback on that…). I expect that once the infrastructure is in place then it shouldn’t be too hard to contribute by making the various stages more fault-tolerant in one way or another.

The example you give is only ambiguous because of the type signature (which is often not even present).
Without such hints, it becomes a probabilistic sort of prediction

Error recovery by doing token insertion (or deletion) is not really new technology, it’s been around for a long time. So I’m not too worried about this problem, I think we can probably pick good heuristics in most cases.

I think GLR parsers like tree-sitter do this and it would perhaps make sense to leverage them.

Happy does in fact have a GLR mode, I have no idea how good it is, though: Generalized LR Parsing — Happy documentation

I think using the ambiguity-handling of GLR parsers for error recovery might go badly with files that have many errors, since I think the number of alternatives would grow exponentially. Whereas for well-formed examples of more typical ambiguous grammars you hopefully have a smaller number of choice points. Better I think to do local error recovery and continue with a single parse?

michaelpj · January 5, 2024, 10:54am

I posted an updated proposal as PR now: Fault-tolerant GHC compilation pipeline by michaelpj · Pull Request #63 · haskellfoundation/tech-proposals · GitHub

LaurentRDC · May 28, 2024, 12:22pm

Looks like the Gleam project has recently implemented this

Topic		Replies	Views
Modern parser generator Learn	7	1374	January 15, 2024
Pre-HFTP: Coordination for finishing Trees That Grow	15	1534	August 28, 2023
Adjusting Haskell98's layout to GHC behaviour Learn	4	305	April 9, 2025
Proposal: An issue tracker for better error messages	24	2683	June 13, 2021
Blog post: Why I Support the Haskell Foundation Links	4	799	June 22, 2021

Pre-HFTP: Fault-tolerant GHC compiler pipeline

Related topics