Noob Help with GADTs and Arrows

BebeSparkelSparkel · June 19, 2024, 9:20pm

This is the first time I have ever found a use for GADTs and Arrows and I am having trouble formulating the constructors I want.

I want to create a type that allows for record fields to be parsed in different orders and does not throw away work that has been successfully completed. For example

data Record = Record Int String

parseRecord :: String -> Maybe Record
parseRecord = runStateT parseRecord'

parseRecord "123\nabc\n" -- Just (Record 123 "abc")
parseRecord "abc\n123\n" -- Just (Record 123 "abc")

This is where I start having trouble

parseRecord' :: StateT String Maybe Record
parseRecord' = runUnordered (lift parseInt . lift parseString)

parseInt :: StateT String Maybe Int
parseString :: StateT String Maybe String

data Unordered m a b where
   -- I think these constructors are wrong
  Incomplete :: Unordered m (a -> b) c -> m a -> Unorderd m b c
  Success    :: Unordered m (a -> b) c ->   a -> Unorderd m b c
  Arr :: (a -> b) -> Unorderd m a b

-- lift doesn't seem possible with the defined constructors
lift :: m a -> Unordered m ? ?

runUnordered :: Unordered m () b -> m b

instance Category (Unordered m) where
  id = Arr id
  (.) = ?

rhendric · June 20, 2024, 12:08am

It’s not at all clear to me why you want arrows for this.

It seems to me that what you want is a data type that can represent either that parsing has resolved and you have the desired value, or that parsing is pending and what you have is a function for getting the desired value plus the parsers that would be used to generate the input to that function. Parsing would then be a matter of alternately trying a parser or resolving ‘the rest’ first, where resolving ‘the rest’ involves returning a value of this data type no longer containing any parsers that ate state.

Something like this might work for you:

{-# LANGUAGE BlockArguments, LambdaCase #-}

-- You could write this GADT-style, but I'm using the other notation to
-- make the point that you don't need full GADTs, only an existential.
data Unordered f result
  = forall a. Pending (Unordered f (a -> result)) (f a)
  | Resolved result

deriving instance Functor (Unordered f)

runUnordered :: forall f a. (Monad f, Alternative f) => Unordered f a -> f a
runUnordered = go >=> \case Resolved a -> pure a; _ -> empty
  where
  go :: Unordered f b -> f (Unordered f b)
  go = \case
    Pending inner p -> do
      inner' <- go inner
      optional p >>= \case
        -- If p succeeds, retry everything still pending in inner'
        Just a -> go $ fmap ($ a) inner'
        -- If p fails, pass it to caller to be retried later
        Nothing -> pure $ Pending inner' p
    other -> pure other


data Record = Record Int String

parseRecord :: Parser Record
parseRecord = runUnordered $
  Resolved Record `Pending` parseInt `Pending` parseString

Leary · June 20, 2024, 1:33am

Spoiler warning: Text.Parser.Permutation probably does what you want.

p.s. @rhendric’s version of Unordered is the free Applicative.

Leary · June 20, 2024, 5:38am

It so happens I wrote something similar for a coroutine implementation a few days ago. Adapted, it becomes:

newtype Unordered f a = Un{ ordered :: Ap f a }
  deriving (Functor, Applicative)

un :: Applicative f => f a -> Unordered f a
un = Un . liftAp

runUnordered :: (Alternative f, Monad f) => Unordered f a -> f a
runUnordered Un{ordered} = case ordered of
  Pure y -> pure y
  other  -> witherAp' other >>= runUnordered . Un

witherAp :: Alternative f => Ap f a -> f (Ap f a)
witherAp = \case
  Pure     y -> pure (Pure y)
  Ap fx afxy -> prepend <$> optional fx <*> witherAp afxy
   where
    prepend = \case
      Nothing -> Ap fx
      Just x  -> fmap ($ x)

-- Not needed for coroutines.
-- Added so `runUnordered` will fail when it can't make progress.
witherAp' :: (Alternative f, Monad f) => Ap f a -> f (Ap f a)
witherAp' afx = do
  afx' <- witherAp afx
  guard (apLength afx' < apLength afx) $> afx'
 where
  apLength :: Ap f a -> Int
  apLength = \case
    Pure _  -> 0
    Ap _ ap -> 1 + apLength ap

(untested)

It’s true that you might want to guard the implementation behind an opaque newtype, but I think the Applicative instance is nice to have, interface-wise.

BebeSparkelSparkel · June 20, 2024, 12:53pm

Thanks I will examine the free package more. I have been needing to do this for a while.

BebeSparkelSparkel · June 20, 2024, 12:55pm

Thank you for removing the complexity of my jumbled thoughts on this. Where did you learn about this concept?

rhendric · June 20, 2024, 5:05pm

I didn’t really use a ‘concept’; I didn’t know ahead of time that I was going to end up with a structure that was equivalent to the free Applicative. I started with what you wanted to achieve—having a list of things to do in any order, and wanting to remove anything that succeeded from that list—and from there worked out what data structure I’d need to represent a computation like that.

Would Either (NonEmpty todo) result work as a type? No, because the things to be done have different types. But they aren’t arbitrary types, so we don’t want to erase the type information entirely in the list. The types of the work to do are related to each other by the type of the constructor function they will be feeding. How to encode that? Well, the type of the constructor function is a nested function type a -> (b -> (c -> ...)), and we want to carry an f a, f b, f c, etc. Ah, there’s a recursive relationship here: if it’s valid to relate a list f a, f b with a function a -> (b -> result0), then it’s valid to relate a list f a, f b, f c with a function a -> (b -> (c -> result1)), making result0 ~ c -> result1. And the base case is the final result of the function, related to no fs to be done at all. We can build a data structure out of those two rules. The base case is Resolved result, and the recursive case wants something that holds a -> result, all the hidden fs that the previous case needs, and f a. The first two inputs are covered by the data type we’re creating, so we have Pending :: forall a. Unordered f (a -> result) -> f a -> Unordered f result. And that’s the definition of Unordered.

From there, you just have to follow your common sense to implementing a runUnordered that does what you want.

We should try left things before right things, so the order of go inner and p is determined.
go can’t fail—it always has the option of returning its argument if nothing can be done. p however can, so it should be wrapped in an optional.
The only moderately interesting bits are the two commented lines, and to develop those I started with those thoughts and type-tetrised my way to lines that implement them.

I guess that’s my advice, if advice is what you’re looking for: get a clear idea of how you want your code to work in the abstract, then let that determine your types, instead of imagining some types based on some loose concepts and trying to wedge an implementation into that shape.

BebeSparkelSparkel · June 20, 2024, 7:47pm

@Leary I have been looking at free more. I remember looking at the package commandert with the type

data CommanderT state f a
  = Action (state -> f (CommanderT state f a, state))
  | Defeat
  | Victory a
  deriving Functor

This seems like isomorphic to

type CommanderT state f a = Free (StateT state f) (Maybe a)

Did I convert that correctly?

olf · June 20, 2024, 8:09pm

If you don’t care that much for the type-safety of the free Applicative (it tells you whether you have provided a parser for every field), another conceptually simpler approach is using commutative monoids and optionally higher-kinded records. Suppose P is your favourite parsing monad. Also suppose that all the fields of Record which do not have a sensible default are lazy.

import Data.Monoid (Endo(...))
updateInt :: Int -> Endo Record
updateInt i = Endo $ \(Record _ s) -> Record i s
updateString :: String -> Endo Record
updateString s = Endo $ \(Record i _) -> Record i s
-- can obtain these automatically with lens 'set' and Template Haskell

Now, using a generic function of type

(Applicative m, Monoid r, Traversable f) => f (m r) -> m r

You can combine your field parsers

[fmap updateInt parseInt,
 fmap updateString parseString]

and appEndo the parsed Endo to an undefined value like Record undefined undefined. Essentially, the unorderedness arises from the fact that the record update functions

\(Record _ s) -> Record i s
\(Record i _) -> Record i s

commute. If you don’t like the undefined and want to make sure no field is forgotten, a HKD representation of partial results comes handy:

newtype ParsedRecord f = ParsedRecord (f Int) (f String)
instance Alternative f => Semigroup (ParsedRecord f) where
   ParsedRecord x y <> ParsedRecord x' y' = 
     ParsedRecord (x <|> x') (y <|> y')
parseIntMaybe :: P (ParsedRecord Maybe)
parseStringMaybe :: P (ParsedRecord Maybe)
fieldParsers :: ParsedRecord P -- makes sure every field has a parser
fieldParsers = ParseRecord parseInt parseString
complete :: Applicative f => ParsedRecord f -> f Record
complete (ParsedRecord x y) = Record <$> x <*> y

Remark: One could even encapsulate this process of field parser combining in this neat function signature:

ParsedRecord P -> P (ParsedRecord Identity)

rhendric · June 20, 2024, 8:16pm

Part of the problem statement was

does not throw away work that has been successfully completed

So wouldn’t you need to shrink your list of field parsers each time one of them succeeds? I don’t see how that would work in this approach.

olf · June 20, 2024, 8:49pm

Since the parsed Endo functions never overwrite the fields that the other parsers are responsible for, I don’t see how we’re throwing away completed work. The disadvantage over your approach is that the Endo Record type carries no guarantee that you get a complete record out of a successful parse, since you could have omitted a field parser. This defect is only avoided by moving to the HKD variant, which is arguably of similar complexity as free Applicatives, plus requires an extra data declaration.

olf · June 20, 2024, 8:59pm

Oh, now I see: The Endo does not guard you against using the same parser twice, thus overwriting an already parsed field with a new value. The behaviour in presence of multiple field values in the parsed text was not specified, was it?
So what is the record that is parsed from "123\n234\nabc"? Should this fail? Is it Record 234 "abc" or Record 123 "234"?

Leary · June 21, 2024, 12:24am

I’m not familiar with the commandert package, but that does appear to be the case.

BebeSparkelSparkel · June 21, 2024, 12:43pm

@olf Completely depends on the implementation of parseInt and parseString so currently it is undefined. on what "123\n234\nabc" parseRecord produces.

blamario · June 28, 2024, 9:33am

The HKD approach you envision here has been implemented by

Depending on the parser, the successfully completed work from the fields is not thrown away because the entire record of results is memoized at every input location. This ends up equivalent to the Earley parsing algorithm.