What is the name of this pattern?

garciat · May 7, 2021, 8:01pm

I am looking for the name of this ‘pattern’ where you mirror a Type Class’ methods with an ADT/GADT (whichever fits), then you make that type an instance of the Type Class itself. Because these are 1:1, it is possible to translate back and forth between them.

Anyway, the final result is that you may write abstract programs in terms of the Type Class methods, then either concretize it to an actual implementation, or “AST-ize” with this ‘mirrored type’ in order to manipulate the program.

For example:

class ParserClass p where
  string :: String -> p String
  eof :: p ()
  ... etc ...

data ParserData a where
  PString :: String -> ParserData String
  PEof :: ParserData ()
  ... etc ...

-- I call this the "AST implementation"
instance ParserClass ParserData where
  string = PString
  eof = PEof
  ... etc ...

-- can also give a concrete implementation
instance ParserClass Parsec where -- I know, not accurate types
  ... bla ..

What I’d do with this is:

-- then I can write an abstract program
data JSON = ...

json :: ParserClass p => p JSON
json = ...

-- and I can observe 'the AST of the program'
-- (in ghci)
>>> json :: ParserData JSON
PString "{" ...

-- I can write an optimizer
optimize :: ParserData a -> ParserData a
optimize = ...

-- because the GADT is 1:1 with the class, I can go from data to class
dataToClass :: ParserClass p => ParserData a -> p a
dataToClass p =
  case p of
    PString s -> string s
    ... etc ...

-- then I can optimize the JSON parser and turn it into a Parsec parser
optimizedParsecJson :: Parsec JSON
optimizedParsecJson = dataToClass (optimize json)

jaror · May 7, 2021, 9:45pm

It sounds a bit like Finally Tagless, but there you would deforest the AST and directly write the optimizations in the instances.

{-# LANGUAGE RankNTypes #-}

class ParserClass p where
  string :: String -> p String
  eof :: p ()

newtype O a = O { optimize :: forall p. ParserClass p => p a }

-- The no-op optimization :P
instance ParserClass O where
  string str = O (string str)
  eof = O eof

optimizedParsecJson :: Parsec JSON
optimizedParsecJson = optimize json

See section 3.4 (page 28) of Oleg’s course (he gives examples of an interpreter and a pretty-printer, but they are pretty analogous I think).

But now it is not really clear how you would get information from the children or the parent, but it is possible by extending O:

data FromParent = Root | ...
data FromChild = ...

newtype O a = O { unO :: forall p. ParserClass p => FromParent -> (p a, FromChild) }

optimize :: ParserClass p => O a -> p a
optimize p = fst $ unO p Root

(I have not tested this)

I just found out that pushing information down is discussed in section 2.4 starting on page 13 of those lecture notes, he actually takes an even more direct approach skipping the O newtype:

instance ParserClass p => ParserClass (FromParent -> p) where
  string str fromParent = ...
  eof fromParent = ...

garciat · May 8, 2021, 10:58am

Thank you for the reading material!

I didn’t take a look into it yet, but I wonder if it would allow me to easily pattern match on the AST structure in a non-trivial manner. For example, with the data approach, I could do something like:

-- in `optimize`
case p of
  PCat (PString s) (PString s') -> PString (s ++ s')
  PCat (PString s) (PCat (PString s') p) -> optimize $ PCat (PString s ++ s') p
  ...

jaror · May 8, 2021, 11:28am

At the end of section 2, Oleg Kiselyov writes this:

The push_neg transformation then in the final style is obtained from the
corresponding transformation on the data type values by composing with the
conversion interpreters:
push_neg = finalize ◦ I.push_neg ◦ initialize
Thus if we forget about extensibility, any processing on data type values, how-
ever non-compositional, can be performed on the corresponding finally-encoded
terms. Using the intermediate data type to implement a transformation on
finally-encoded terms is inefficient, and destroys the extensibility. It is an open
question whether the intermediate data type values can be deforested or fused
in.

That equation is basically the transformation that you propose in this post: first convert to a datatype, then run your transformation, and finally convert back to type classes. He writes that it is an open problem whether every such transformation can be deforested, i.e. done without transforming to a datatype. But I think it is possible for your example.

danidiaz · May 8, 2021, 11:46am

Perhaps related to the technique described in Unembedding Domain-Specific Languages (paper) (video). The paper also covers how to convert DSLs with binding.

jaror · May 8, 2021, 12:13pm

Here is a version that optimizes without converting to an intermediate data type:

{-# LANGUAGE LambdaCase, FlexibleInstances #-}
data ExpD
  = Cat ExpD ExpD
  | Str String
  deriving Show

optimizeD :: ExpD -> ExpD
optimizeD = \case
  Cat (Str x) (Str y) -> Str (x ++ y) 
  Cat (Str x) (Cat (Str y) z) -> Cat (Str (x ++ y)) z
  x -> x

class ExpC repr where
  cat :: repr -> repr -> repr
  str :: String -> repr

data Ctx e = L e | E

instance ExpC repr => ExpC (Ctx (String -> repr) -> repr) where
  str s E = str s
  str s (L f) = f s
  cat e1 e2 ctx = e1 (L (\s1 -> e2 (L (\s2 -> str (s1 ++ s2) ctx))))

optimizeC :: ExpC repr => (Ctx (String -> repr) -> repr) -> repr
optimizeC exp = exp E

instance ExpC String where
  cat x y = "(cat " ++ x ++ " " ++ y ++ ")"
  str x = "(str " ++ show x ++ ")"

I do think that the pattern matching solution is easier, but I also think you can get used to the tagless approach.

Edit: I now also notice that I’m perhaps using a bit too much continuation passing style, perhaps a direct implementation would be easier to read and write.

Update: Here is a more direct version, but it doesn’t look that great either:

instance ExpC repr => ExpC (Maybe String -> Either String repr) where
  str s Nothing = Left s
  str s2 (Just s1) = Left $ s1 ++ s2
  cat e1 e2 ctx = case e1 ctx of
                    Left s -> e2 (Just s)
                    Right e1' -> Right $ cat e1' (either str id (e2 Nothing))

optimizeC' :: ExpC repr => (Maybe String -> Either String repr) -> repr
optimizeC' exp = either str id $ exp Nothing

tomjaguarpaw · May 8, 2021, 5:28pm

I think it’s either a free Parser or an initial Parser, perhaps each in different contexts.

garciat · May 14, 2021, 12:25pm

Thanks all!

Eventually I got to the point where I wanted to write recursive parser definitions. Of course, that makes it impossible (or at least hard?) to write pure program transformations.

So I introduced the fix operation to the DSL. I got it working, but I’m not fully content or convinced that I achieved an appropriate typing for this construct. Also, I am not sure why the ‘obvious’ typing didn’t work.

I tried this at first:
(Note: this is a reduced DSL)

class Rec f where
  rec :: (f a -> f a) -> f a

data DataRec a where
  DRec :: (DataRec a -> DataRec a) -> DataRec a

instance Rec DataRec where
  rec = DRec

However, I was not able to implement this function:

recDataToClass :: forall a. DataRec a -> (forall f. Rec f => f a)
recDataToClass = \case
  DRec f -> ???

So I went with this:

class Rec f where
  rec :: (forall g. Rec g => g a -> g a) -> f a

data DataRec a where
  DRec :: (forall g. Rec g => g a -> g a) -> DataRec a

instance Rec DataRec where
  rec = DRec

recDataToClass :: forall a. DataRec a -> (forall f. Rec f => f a)
recDataToClass = \case
  DRec f -> rec f

That seems to be working for me, for now. However, I am not sure if this second typing will impose some kind of limitation down the line.

FWIW, the full source code for this thing I’m playing around with is here: https://gist.github.com/Garciat/5ea9d5a66bc5db95f13ddc36e70e7c99

jaror · May 14, 2021, 1:12pm

Sections 2.3 (notably the Wrapped newtype seems similar to your solution) and 3.5 from Oleg’s lecture notes seem related. 3.5 refers to TTIF.hs where he implements a full language with a fixed point operator in Tagless Final style. Here is the line where he converts from initial to final encoding: https://github.com/michaelt/tagless/blob/05ca14710d0704fb1e5a52ee6887f0d52a3c0372/TTIF.hs#L230. He seems to need a Var constructor in the initial encoding.