Type-Indexed Expression Evaluation

Finley · February 18, 2022, 4:13am

I’m trying to write an evaluator for type-indexed expressions in a very simple imperative language. What I’ve found is evaluating identifier expressions is difficult, and I’m wondering what the best representation of the environment is. Currently, my environment is a map from identifiers to existentially typed values. When I then look up an identifier, it’s difficult (impossible?) to prove that the type of the existential value is in fact equivalent to the type of the identifier expression. Do I need to change the representation of the environment or is this somehow possible with the existentials? For clarity, here is the type of the evaluator I’m writing, and the case I’m struggling with:

eval :: TExp ty -> Map String SomeVal -> ty
eval (TEId ident) env = case lookup ident env of
  Just someVal -> ???
  Nothing -> error ...

Here is the SomeVal type:

data SomeVal where
  SomeVal :: Val a -> SomeVal

And Val just has boolean and integer constructors:

data Val a where
  VBool :: Bool -> Val Bool
  VInt :: Integer -> Val Integer

Any pointers or tips would be very much appreciated! Happy to provide more context if necessary.

santiweight · February 18, 2022, 6:14am

Heya

So there’s quite a particular reason you’re having trouble… In Haskell terms depend on their types - meaning that types constrain our terms:

data Ty = TInt | TBool

data Val (ty :: Ty) where
  VInt :: Int -> Val TInt
  VBool :: Bool -> Val TBool

-- Now the following term depends on the type, 
-- because the terms that are valid are constrained 
-- by the types assigned to them
intVal :: Val TInt
intVal = VInt 1
-- The following would be invalid because terms depend on their types:
-- intVal = VBool True

So I expect you know all that already - that’s not news, but now let’s look at what you asked to do.

case Map.lookup (ident :: String) env of
  Just (SomeVal (someVal :: Val someTy)) -> ???
  Nothing -> ...

What you are asking the compiler to do is to realize that the someTy in the parameter to Val is the same as the type ty in the expected argument type. In other words, we are basically trying to write the following function:

tryProof :: Ty ty -> SomeVal -> Maybe (Val ty)
tryProof _ty (SomeVal (someVal :: Val SomeTy)) = ...

Try as you might - this function is impossible to write. Why you might ask? Because you are asking for the runtime value in SomeVal to reflect its type to the compiler. In some sense you are asking for the type of the Val someTy in SomeVal to tell something to the type system - namely whether or not it is the same as the ty in the type signature. You are asking for the types to depend on the terms. This is a little hard to see, but the reason is because remember that Haskell erases its types at runtime, so we cannot recover the type at runtime because Haskell does not provide that ability.

Well, you can do it… Here are your options (I recommend options 4, 1, 2, then worst 3 in that order) :

De Bruijn indices - this options leaks the types out from the Map environment using some fancy GADTs. This is a very important step in understanding, but is generally not the best, because variable lookups now become O(n) in the number of variables. But note that this representation removes the existential around values in the environment, hence why those types can be inspected by the compiler. I couldn’t find a tutorial for de bruijn indices that wasn’t in Idris, but the following example will work when rewritten in Haskell, and I believe there is a stackoverflow out there with Haskell code. Example: The Well-Typed Interpreter — Idris2 0.0 documentation
Use a dependently typed language (Idris, Agda are the closest to Haskell). One thing that dependently typed languages allow you to do is optionally inspect the types of terms (at the cost of a small runtime penalty). In your case this would allow you to write the equivalent of if someTy == ty then someVal else error, using pattern matching on the actual types themselves. Note that the “dependent types” here mean that the types depend on terms - since the terms can tell the compiler their types (where there are potentially many possibilities).
Use Typeable and associated type classes. I recommend doing this only after you understand the rest, because it’s just a hack version of the final option…
Use Haskell but fake dependent types, which I show below. This is how I do this nowadays (when I do). Basically you keep a GADT STy standing for "singleton of Ty" that allows you to recover types from a term, and then keep that with SomeVal, so the compiler can check the terms of existential types. This would be much more obvious in Agda/Idris, but it’s still possible to grok in Haskell.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}

module MyLib (someFunc) where

import qualified Data.Map.Strict as Map
import Data.Map.Strict (Map)
import Data.Type.Equality (TestEquality(testEquality), (:~:)(Refl))

data Ty = TInt | TBool

data STy (ty :: Ty) where
  STInt :: STy TInt
  STBool :: STy TInt

instance TestEquality STy where
  testEquality STInt STInt = Just Refl
  testEquality STInt STBool = Nothing
  testEquality STBool STInt = Nothing
  testEquality STBool STBool = Just Refl

data Val (ty :: Ty) where
  VInt :: Int -> Val TInt
  VBool :: Bool -> Val TBool

data TExp (ty :: Ty) where
  TEPlus :: TExp TInt -> TExp TInt -> TExp TInt
  TEAnd :: TExp TBool -> TExp TBool -> TExp TBool
  TEIfThenElse :: TExp TBool -> TExp a -> TExp a -> TExp a
  TEId :: String -> STy ty -> TExp ty

data SomeVal where
  SomeVal :: Val ty -> STy ty -> SomeVal

eval :: TExp ty -> Map String SomeVal -> Val ty
eval exp env = case exp of
  TEPlus lExp rExp -> let (VInt lInt) = eval lExp env
                          (VInt rInt) = eval rExp env
                      in VInt (lInt + rInt)
  TEAnd lExp rExp -> let (VBool lBool) = eval lExp env
                         (VBool rBool) = eval rExp env
                     in VBool (lBool && rBool)

  TEIfThenElse condExp thenExp elseExp -> let VBool condBool = eval condExp env
                                    in eval (if condBool then thenExp else elseExp) env
  TEId varName sExpectedTy ->  case Map.lookup varName env of
        Just (SomeVal varVal sVarTy) -> case testEquality sExpectedTy sVarTy of
          Nothing -> error "Type mismatch between expected and found type"
          -- NOTE: You have to pattern match on the actual Refl - because GHC isn't smart enough
          -- if you write "Just myRefl" to realise that there is only one constructor that contains the
          -- proof you want
          Just Refl -> varVal
        Nothing -> error "Oopsie!"

  -- The same as the above but with types exposed for clarity
  -- TEId varName (sExpectedTy :: STy tyExpected) ->  case Map.lookup varName env of
  --       Just (SomeVal (varVal :: Val tyFound) (sVarTy :: STy tyFound)) -> case testEquality sExpectedTy sVarTy of
  --         Nothing -> error "Type mismatch between expected and found type"
  --         Just (Refl :: tyExpected :~: tyFound ) -> varVal
  --       Nothing -> error "Oopsie!"

I won’t dig into Refl too much here - I recommend the first chapter of PLFA by Phil Wadler if you would like to understand that better, but it basically proves that two types are equal, because the only constructor in Refl requires that its two type parameters be equal.

Further reading:

Justin Le’s 4 part tutorial series on singletons, but I recommend this as the first step in your journey, because it covers everything in great detail. Seriously, this blog series is amazing - Introduction to Singletons (Part 1) · in Code. It will teach you about everything discussed in my response. He currently does a lot of work on type-safe neural nets with dependent types, which is nifty!
The singletons Haskell library which removes the STy and TestEquality boilerplate. This library is “unergonomic dependent types in Haskell”, written by Richard Eisenberg, who is currently trying to add dependent types to GHC. The readme linked to from package is very detailed… singletons: Basic singleton types and definitions
Idris 2’s docs are really really good: A Crash Course in Idris 2 — Idris2 0.0 documentation. They explain dependent types darned well.
PLFA by Phil Wadler is great if you’re interested in understanding some of the theory behind all this: https://plfa.github.io
Philip Wadler – Propositions as Types - YouTube “Propositions as types” gets to the beauty of types in general, but isn’t directly tied to the rest of my recommendations

Please do ask any more questions

Finley · February 18, 2022, 9:17pm

What an incredible answer. This was so helpful! I had a feeling that singletons could help me solve this somehow but couldn’t work it out myself. After doing it, I can understand why they’re referred to as “unergonomic” but it has helped my understanding quite a lot. All of these resources will be invaluable to me as I continue looking into this stuff. Thank you so much!

I do have another question! Using singletons, we still have to handle the case where the testEquality function does not return Refl in the evaluator, which is undesirable since the type checker could ensure that that can’t happen. Am I correct in my understanding that De Bruijn indices are how we could create a “type-level environment” index on the expression type which encodes not just the well-typed but also well-scoped nature of expressions?

jaror · February 18, 2022, 9:33pm

I think what you need for that is an environment which stores all variable bindings which is indexed by the type of all the bindings that are contained in that environment.

De Bruijn indices are the simplest way to do that, because then the environment is basically a heterogeneous linked list indexed by a list of types for each stored value. However, I think you can also use more complicated types like a dependent map which maps variable names to values, indexed by the keys and types of the values stored at each key.

For more info see this blog post: Jesper Cockx - 1001 Representations of Syntax with Binding. It uses Agda, but that should be kind of readable if you are comfortable with Haskell. (Edit: This doesn’t really consider writing interpreters or evaluators, so it is not immediately obvious how to use all these representations).

santiweight · February 18, 2022, 10:41pm

Very cool blog! I’d never seen this blog before

byorgey · February 22, 2022, 2:09am

For representing type-indexed heterogeneous environments with logarithmic access time, check out Wouter Swierstra’s really cool functional pearl from ICFP 2020: Heterogeneous binary random-access lists | Journal of Functional Programming | Cambridge Core (open access).