Elaboration after type inference?

Hello there,

I’m in the midst of figuring out how type checking and inference can be implemented. Everything so far is based on the OutsideIn(X) paper (which is a wonderful paper by the way!) and two talks by Simon Peyton Jones (the “100 Types” and type inference as constraint solving).

This wall of text boils down to a profane question to which I probably guessed the answer already, but I want to be sure …

The language I’m toying around with is a “first simple fragment”, it contains application and lambda abstraction as well as top-level definitions which can be type annotated or not. This sets the local-let aside. (Eventually I want to experiment with and understand records and rows, so I try to reduce the friction and keep it simple.)

On the typing side nothing special as well, just type classes are added on top of everything which is needed for the syntax.

Now, in the talks I’ve mentioned the process of elaboration (that is, adding type and type class dictionary arguments into the AST) happens while generating the constraints. Jones mentions that there are details that this is not actually the case for the hole program – and I guess its these details I stumble upon.

The underlying question is: In which sequence happens typing and elaboration when several top-level definitions exist?

Considering typing alone for a moment, as far as I can see, the OutsideIn paper does one at a time – when one is checked and inferred I’m allowed to use it as part of the context, hence I would probably implement this with some dependency tree (not graph! no (mutually) recursive definitions) to determine the order. When it is typed already, I have all the info to elaborate.

Consider for example something like this in Haskell:

g :: Int -> [Char]
g x = f x ++ "<END>"

f x = "<START>" ++ show x

I would need to start with f, then instantiate its polymorphic type and elaborate f x to f @a d x.

While this is probably just the answer, I still don’t want to believe that I cannot just go the way of the big picture: constraints and elaboration for everything! Simplify everything! Done! ; ). Hence I’m here asking:

(1) Before elaboration the type of the identifier which gets expanded has to be known, either because it was given via annotation or the inference actually has to happen for this identifier first. There is no magic I’m overseeing here, I can only elaborate after type inference if the type is not given, not while constraints are generated?
(2) Constraints are not generated for the hole program (or say: module) but rather a dependency tree is created first and this tree determines an order of type inference and elaboration?
(3) Does each top-level definition get its own typing pass? Or are several independent ones are put together?
(4) I guess this is then Pandora’s box: What about (mutual) recursive definitions?

Thanks in advance,

aphorisme

3 Likes

Interesting questions!

Elaboration is a fairly general concept. In principle, there is nothing preventing an elaboration strategy whereby a call f x to a function f of unknown type gets elaborated to a program with a hole (metavariable), such that the hole will later be filled with f applied to some implicit (type/constraint) arguments, once the type of f becomes known. So one could have an elaborator that traversed the AST in source code order, producing a well-typed intermediate representation containing holes, then dynamically figured out an order to fill those holes based on information discovered later. Another way to look at this: one can imagine a multi-threaded elaborator where each thread is trying to elaborate a piece of source syntax, some threads block waiting for other threads, and hopefully the whole thing doesn’t deadlock.

But in practice, a simpler and more predictable strategy to resolving the issues you raise, as I think you have guessed, is to topologically sort the dependency graph of top-level definitions produced by name resolution, then elaborate them in dependency order. I believe this is what GHC does. Of course, definitions with type signatures can be assumed to have correct types when checking other definitions that reference them (at least in Haskell; things get more complicated in dependently-typed languages).

This does mean that for mutually-recursive definitions (aka strongly-connected components in the dependency graph) without type signatures, the type-checker has to create type metavariables for the types of the definitions, elaborate them all at once, and generalise afterwards. Since type metavariables cannot be filled with polytypes, there’s no need to insert implicit arguments when elaborating the calls. But as a consequence, one cannot have polymorphic recursion amongst a mutually-recursive group of definitions without type signatures, just as Haskell requires top-level type signatures for individual definitions using polymorphic recursion.

7 Likes

What GHC does is this:

  • Dependency analysis, so that it processes the defn of f before that of g
  • Generate constraints from f’s defn.
  • Solve those constraints, to give a minimal set of constraints, say (Show a).
  • Abstract over them to give f a type, f :: forall a. Show a => [a] -> String
  • Extend the environment with that type for f
  • Now generate constraints etc for g

So yes, when it comes to inferring the most general type for a let-bound identifier that lacks a type signature then yes, we must solve constraints from f’s rhs now rather than deferring them till the end. (If the function has a type signature, we can defer till later.)

In GHC’s code base this “solve constraints and infer type” bit is called GHC.Tc.Solver.simplifyInfer. It sadly has many rather ad-hoc decisions, but that is fundamentally in the nature of generalising let-bound identifiers.

11 Likes

Thanks for your answers. This helps a lot!

1 Like

In TypeScript, there are several places where type inference is used to provide type information when there is no explicit type annotation.