I am looking for the name of this ‘pattern’ where you mirror a Type Class’ methods with an ADT/GADT (whichever fits), then you make that type an instance of the Type Class itself. Because these are 1:1, it is possible to translate back and forth between them.
Anyway, the final result is that you may write abstract programs in terms of the Type Class methods, then either concretize it to an actual implementation, or “AST-ize” with this ‘mirrored type’ in order to manipulate the program.
For example:
class ParserClass p where
string :: String -> p String
eof :: p ()
... etc ...
data ParserData a where
PString :: String -> ParserData String
PEof :: ParserData ()
... etc ...
-- I call this the "AST implementation"
instance ParserClass ParserData where
string = PString
eof = PEof
... etc ...
-- can also give a concrete implementation
instance ParserClass Parsec where -- I know, not accurate types
... bla ..
What I’d do with this is:
-- then I can write an abstract program
data JSON = ...
json :: ParserClass p => p JSON
json = ...
-- and I can observe 'the AST of the program'
-- (in ghci)
>>> json :: ParserData JSON
PString "{" ...
-- I can write an optimizer
optimize :: ParserData a -> ParserData a
optimize = ...
-- because the GADT is 1:1 with the class, I can go from data to class
dataToClass :: ParserClass p => ParserData a -> p a
dataToClass p =
case p of
PString s -> string s
... etc ...
-- then I can optimize the JSON parser and turn it into a Parsec parser
optimizedParsecJson :: Parsec JSON
optimizedParsecJson = dataToClass (optimize json)
It sounds a bit like Finally Tagless, but there you would deforest the AST and directly write the optimizations in the instances.
{-# LANGUAGE RankNTypes #-}
class ParserClass p where
string :: String -> p String
eof :: p ()
newtype O a = O { optimize :: forall p. ParserClass p => p a }
-- The no-op optimization :P
instance ParserClass O where
string str = O (string str)
eof = O eof
optimizedParsecJson :: Parsec JSON
optimizedParsecJson = optimize json
But now it is not really clear how you would get information from the children or the parent, but it is possible by extending O:
data FromParent = Root | ...
data FromChild = ...
newtype O a = O { unO :: forall p. ParserClass p => FromParent -> (p a, FromChild) }
optimize :: ParserClass p => O a -> p a
optimize p = fst $ unO p Root
(I have not tested this)
I just found out that pushing information down is discussed in section 2.4 starting on page 13 of those lecture notes, he actually takes an even more direct approach skipping the O newtype:
instance ParserClass p => ParserClass (FromParent -> p) where
string str fromParent = ...
eof fromParent = ...
I didn’t take a look into it yet, but I wonder if it would allow me to easily pattern match on the AST structure in a non-trivial manner. For example, with the data approach, I could do something like:
-- in `optimize`
case p of
PCat (PString s) (PString s') -> PString (s ++ s')
PCat (PString s) (PCat (PString s') p) -> optimize $ PCat (PString s ++ s') p
...
At the end of section 2, Oleg Kiselyov writes this:
The push_neg transformation then in the final style is obtained from the
corresponding transformation on the data type values by composing with the
conversion interpreters:
push_neg = finalize ◦ I.push_neg ◦ initialize
Thus if we forget about extensibility, any processing on data type values, how-
ever non-compositional, can be performed on the corresponding finally-encoded
terms. Using the intermediate data type to implement a transformation on
finally-encoded terms is inefficient, and destroys the extensibility. It is an open
question whether the intermediate data type values can be deforested or fused
in.
That equation is basically the transformation that you propose in this post: first convert to a datatype, then run your transformation, and finally convert back to type classes. He writes that it is an open problem whether every such transformation can be deforested, i.e. done without transforming to a datatype. But I think it is possible for your example.
Perhaps related to the technique described in Unembedding Domain-Specific Languages(paper)(video). The paper also covers how to convert DSLs with binding.
Here is a version that optimizes without converting to an intermediate data type:
{-# LANGUAGE LambdaCase, FlexibleInstances #-}
data ExpD
= Cat ExpD ExpD
| Str String
deriving Show
optimizeD :: ExpD -> ExpD
optimizeD = \case
Cat (Str x) (Str y) -> Str (x ++ y)
Cat (Str x) (Cat (Str y) z) -> Cat (Str (x ++ y)) z
x -> x
class ExpC repr where
cat :: repr -> repr -> repr
str :: String -> repr
data Ctx e = L e | E
instance ExpC repr => ExpC (Ctx (String -> repr) -> repr) where
str s E = str s
str s (L f) = f s
cat e1 e2 ctx = e1 (L (\s1 -> e2 (L (\s2 -> str (s1 ++ s2) ctx))))
optimizeC :: ExpC repr => (Ctx (String -> repr) -> repr) -> repr
optimizeC exp = exp E
instance ExpC String where
cat x y = "(cat " ++ x ++ " " ++ y ++ ")"
str x = "(str " ++ show x ++ ")"
I do think that the pattern matching solution is easier, but I also think you can get used to the tagless approach.
Edit: I now also notice that I’m perhaps using a bit too much continuation passing style, perhaps a direct implementation would be easier to read and write.
Update: Here is a more direct version, but it doesn’t look that great either:
instance ExpC repr => ExpC (Maybe String -> Either String repr) where
str s Nothing = Left s
str s2 (Just s1) = Left $ s1 ++ s2
cat e1 e2 ctx = case e1 ctx of
Left s -> e2 (Just s)
Right e1' -> Right $ cat e1' (either str id (e2 Nothing))
optimizeC' :: ExpC repr => (Maybe String -> Either String repr) -> repr
optimizeC' exp = either str id $ exp Nothing
Eventually I got to the point where I wanted to write recursive parser definitions. Of course, that makes it impossible (or at least hard?) to write pure program transformations.
So I introduced the fix operation to the DSL. I got it working, but I’m not fully content or convinced that I achieved an appropriate typing for this construct. Also, I am not sure why the ‘obvious’ typing didn’t work.
I tried this at first:
(Note: this is a reduced DSL)
class Rec f where
rec :: (f a -> f a) -> f a
data DataRec a where
DRec :: (DataRec a -> DataRec a) -> DataRec a
instance Rec DataRec where
rec = DRec
However, I was not able to implement this function:
recDataToClass :: forall a. DataRec a -> (forall f. Rec f => f a)
recDataToClass = \case
DRec f -> ???
So I went with this:
class Rec f where
rec :: (forall g. Rec g => g a -> g a) -> f a
data DataRec a where
DRec :: (forall g. Rec g => g a -> g a) -> DataRec a
instance Rec DataRec where
rec = DRec
recDataToClass :: forall a. DataRec a -> (forall f. Rec f => f a)
recDataToClass = \case
DRec f -> rec f
That seems to be working for me, for now. However, I am not sure if this second typing will impose some kind of limitation down the line.
Sections 2.3 (notably the Wrapped newtype seems similar to your solution) and 3.5 from Oleg’s lecture notes seem related. 3.5 refers to TTIF.hs where he implements a full language with a fixed point operator in Tagless Final style. Here is the line where he converts from initial to final encoding: https://github.com/michaelt/tagless/blob/05ca14710d0704fb1e5a52ee6887f0d52a3c0372/TTIF.hs#L230. He seems to need a Var constructor in the initial encoding.