Pre-HFTP: Fault-tolerant GHC compiler pipeline

I have drafted a HF proposal for work on making GHC’s compiler pipeline more fault-tolerant, that is:

  • Capable of producing partial output if there is an error (rather than stopping immediately)
  • Capable of consuming partial output and still producing useful diagnostics

This would solve the problem where errors from earlier stages (such as parsing) mask errors from later stages (such as typechecking); as well as benefiting tools that use the compiler, which could then work on partially broken programs.

Full proposal here

21 Likes

Very interesting, thank you for writing this down!
Just a couple thoughts for now:

Step 1: handle fully-recoverable parse errors

A “fully-recoverable” parse error is one where we can know (or make a very good guess) what the user intended. This usually means guessing a few missing literal tokens, such as a trailing bracket.

tup :: (Int, Int)
tup = (1, 2

The example you give is only ambiguous because of the type signature (which is often not even present).
Without such hints, it becomes a probabilistic sort of prediction, which is where corpus-based approaches (or language models as suggested by @maxigit) can help, but do we really want to 10x-50x the size of GHC to put CodeLLaMa or friends inside it?

Step 2: handle other parse errors by producing partial output

I think GLR parsers like tree-sitter do this and it would perhaps make sense to leverage them.

Step 3 etc.

Accepting distributions of tokens/types/ASTs is even harder: the work for each stage multiplies by the number of predictions, and the subsequent output in turn.

1 Like

Would there be tasks open to GHC beginner ?

To be honest, I’m unsure at this point. I also don’t know enough about GHC to say how hard this stuff is (and I’m hoping to get feedback on that…). I expect that once the infrastructure is in place then it shouldn’t be too hard to contribute by making the various stages more fault-tolerant in one way or another.

The example you give is only ambiguous because of the type signature (which is often not even present).
Without such hints, it becomes a probabilistic sort of prediction

Error recovery by doing token insertion (or deletion) is not really new technology, it’s been around for a long time. So I’m not too worried about this problem, I think we can probably pick good heuristics in most cases.

I think GLR parsers like tree-sitter do this and it would perhaps make sense to leverage them.

Happy does in fact have a GLR mode, I have no idea how good it is, though: Generalized LR Parsing — Happy documentation

I think using the ambiguity-handling of GLR parsers for error recovery might go badly with files that have many errors, since I think the number of alternatives would grow exponentially. Whereas for well-formed examples of more typical ambiguous grammars you hopefully have a smaller number of choice points. Better I think to do local error recovery and continue with a single parse?

1 Like

I posted an updated proposal as PR now: Fault-tolerant GHC compilation pipeline by michaelpj · Pull Request #63 · haskellfoundation/tech-proposals · GitHub

4 Likes

Looks like the Gleam project has recently implemented this