Parsing in Haskell is usually done with parser combinators. However these will get stuck in an infinite loop if you write left recursive parsers, e.g.:
expr = Plus <$> expr <* char '+' <*> expr
<|> ...
This means we can never define natural left recursive grammars in Haskell if we use this approach to embedding the grammars.
Instead we can use custom combinators and add a manually written string to identify grammar rules (as the gll
package does):
expr = "Expr" <::=> Plus <$$> expr <** char '+' <**> expr
<||> ...
However, now we must ensure manually that we use a different string in every place. The compiler won’t help us catch name conflicts.
After a bit of experimentation I found that there is a way to use generic programming to define grammars without having to use strings as names:
data Nat = Nat
deriving (GHC.Generic)
deriving anyclass (Generic, HasDatatypeInfo)
nat :: CFG
nat = grammar Nat \nat -> [
nat :-> [nat, "0"],
nat :-> [nat, "1"],
nat :-> ["1"]
]
data Expr = Expr
deriving (GHC.Generic)
deriving anyclass (Generic, HasDatatypeInfo)
expr :: CFG
expr = grammar Expr \expr -> [
expr :-> ["(", expr, ")"],
expr :-> [expr, "+", expr],
expr :-> [nat]
]
The grammar
function uses generic programming to generate a unique name based on the module and data type name. This way the Haskell compiler ensures we have no name conflicts. If there are two nonterminals with the same name we can use qualified imports to disambiguate them.
Full source code:
#!/usr/bin/env cabal
{- cabal:
build-depends: base, generics-sop
-}
{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DerivingStrategies #-}
{-# OPTIONS_GHC -Wno-name-shadowing #-}
import Data.String ( IsString(..) )
import qualified GHC.Generics as GHC
import Generics.SOP
data CFG = CFG Symbol [Production] deriving Show
data Production = CFG :-> [CFG] deriving Show
data Symbol = SymVar ModuleName DatatypeName | SymLit String deriving Show
instance IsString CFG where
fromString x = CFG (SymLit x) []
grammar :: forall a. (HasDatatypeInfo a, Generic a, Code a ~ '[ '[]]) => a -> (CFG -> [Production]) -> CFG
grammar _ f = CFG s (f (CFG s []))
where
di = datatypeInfo (Proxy @a)
s = SymVar (moduleName di) (datatypeName di)
data Nat = Nat
deriving (GHC.Generic)
deriving anyclass (Generic, HasDatatypeInfo)
nat :: CFG
nat = grammar Nat \nat -> [
nat :-> [nat, "0"],
nat :-> [nat, "1"],
nat :-> ["1"]
]
data Expr = Expr
deriving (GHC.Generic)
deriving anyclass (Generic, HasDatatypeInfo)
expr :: CFG
expr = grammar Expr \expr -> [
expr :-> ["(", expr, ")"],
expr :-> [expr, "+", expr],
expr :-> [nat]
]
main :: IO ()
main = print expr