Ghc pipeline: type checking/elaboration and translation to Core

My superficial understanding of the ghc compiler pipeline is that at some point it does (1) type checking/elaboration and (2) translation to Core.

Do these two things happen simultaneously or sequentially? In other words, is the output of type checking still in the same language as its input, just with more information added in? Or is it Core? More generally, what would be the pros and cons of doing things either way?

Thanks!

Hi @dnalor!

These steps happen sequentially in GHC. The output of type checking/type elaboration (“elaboration” on its own might be interpreted by some as “translation to an internal language”) is still phrased in terms of Haskell syntax. You can see this output by passing GHC the -ddump-tc option. It probably won’t look all that type-elaborated though, so you may want to add -fprint-typechecker-elaboration as well, which adds lots of extra output.

Pros of this approach (type checking before desugaring to Core):

  • The type-checked code is Haskell code. This makes it easier to present errors in a form that users can relate to what they wrote.
  • Type-checking can inform desugaring. If we desugared first, our Core language would have to be more complicated to allow for ambiguity among forms disambiguated by type-checking.
  • We can write out the rules of type checking for Haskell. (Though, somehow, we haven’t really done this. I started this repo as an attempt to do so, but never finished.)

Cons of this approach:

  • For forms that could be desugared without type information, the type checker is maybe more complex than it needs to be. That is, if we do a syntactic transformation to simpler code before type checking, that makes type checking easier. (Example: we can imagine expanding out do notation to uses of >>= and other monadic operators. Then we could drop type-checking of do altogether.)
  • It might be less performant, because we have to reconstruct the entire abstract syntax tree (AST) after type checking.

There is a start of adding some simplification of AST before type-checking while retaining the original AST for error reporting, but more work remains to be done on that front.

An alternative answer is to think about the idea of doing type-checking and desugaring all at once, with no intermediate AST. This would likely be more performant, because we don’t have to rebuild a type-checked AST. It would also make the code more complicated, because doing two things at once seems quite a bit harder than doing them separately. That said, it might be worthwhile – it’s hard to know for sure.

3 Likes

Thanks for the answer, this sort of insight is very much what I was hoping for.

This point seems particularly interesting. Is there a simple example of a syntactic form that desugars to two different Core constructs depending on the result of type elaboration?

The expression f x desugars differently depending on whether x's type is unlifted. If x is lifted (that’s a normal type that evaluates lazily), the core is f x. If it’s unlifted (a strict type), the core is case x of y -> f y. Core’s case construct is strict, unlike Haskell’s.