Enforce stricter type-level guarantee in an AST

As part of a process to learn programming language design and compilers, I’m designing an AST for a Chinese programming language, but while doing so I often feel “guilty” for how “loose” my type constraints are. Allow me to explain with an example:

My basis for the AST is inspired by Serokell’s blog post (which is why there is a polymorphic field used to store the location) and looks like this:

data Type a = Num a | List a | String a | Bool a
  deriving (Foldable, Show, Eq)

data Exp a
  = EInt a Integer String
  | EFloat a Double String
  | EList a String
  | EVar a String
  | EString a String
  | EBool a Bool
  deriving (Foldable, Show, Eq)

This is fine, but part of my language grammar sounds like this: “I have [one] [number], which is [3].”* What troubles me here is that while I know that I will always have a certain “number” of variables (i.e. [one], [two], [three]…, but not [true], [“abcdefg”], or [x]), in my type constructor I still have to use Exp a:

data Statement a
  = DeclareStmt a (Exp a) (Type a) [Exp a]
  | ...

Ideally I want to be able to write the first (Exp a) as number (EInt), and I am able to enforce this guarantee on the grammar level using Happy, but I don’t know how to do this on the type level. Therefore, I have to constantly remind myself that the first (Exp a) is actually EInt, and I fear that this lack of type-level guarantee will backfire on me down the road.

I’m also aware that I can just extract that node and store a number directly into the AST, like this:

data Statement a
  = DeclareStmt a Integer (Type a) [Exp a]
  | ...

However, this doesn’t feel right to me since I’m transforming certain AST nodes earlier than others (if this concern is completely unnecessary, please let me know!). In addition, I wish to keep how the number was originally typed since there can be multiple ways to type a number in Chinese and I hope to utilize my AST as a formatter in the future.

Therefore, my question is, is it okay to store an Integer plainly inside the AST like this? If not, are there any tricks or alternative AST designs where I can tell myself on the type level that a certain node is guaranteed to be a certain type (or do I just have to live with the current status quo)? Other comments on implementing ASTs are also appreciated!

Thank you in advance for your time!

* Note: if you’re wondering why this weird syntax, it’s because the language allows declaring multiple variables together like this: “I have [three] [number]s, which are [2], [3], [x], called [a], [b], and [c], respectively.” ([x] is a variable defined earlier), which is equivalent to int a = 2, b = 3, c = x;

You can look into GADTs! They’re the primary mechanism for solving this problem

4 Likes

One possible method is to add another data type for integers:

data LiteralInt a
  = LiteralInt a Integer String
  (deriving Foldable, Show, Eq)

data Exp a
  = EInt a (LiteralInt a)
  | ...

data Statement a
  = DeclareStmt a (LiteralInt a) (Type a) [Exp a]
  | ...

You say you’re concerned that storing Integer directly in the AST “doesn’t feel right”. I don’t think that itself is a problem. But, as you say, if you don’t store the original string representing the integer, you may not be able to recreate the parsed text exactly in a formatter.

1 Like