This is the first time I have ever found a use for GADTs and Arrows and I am having trouble formulating the constructors I want.
I want to create a type that allows for record fields to be parsed in different orders and does not throw away work that has been successfully completed. For example
data Record = Record Int String
parseRecord :: String -> Maybe Record
parseRecord = runStateT parseRecord'
parseRecord "123\nabc\n" -- Just (Record 123 "abc")
parseRecord "abc\n123\n" -- Just (Record 123 "abc")
This is where I start having trouble
parseRecord' :: StateT String Maybe Record
parseRecord' = runUnordered (lift parseInt . lift parseString)
parseInt :: StateT String Maybe Int
parseString :: StateT String Maybe String
data Unordered m a b where
-- I think these constructors are wrong
Incomplete :: Unordered m (a -> b) c -> m a -> Unorderd m b c
Success :: Unordered m (a -> b) c -> a -> Unorderd m b c
Arr :: (a -> b) -> Unorderd m a b
-- lift doesn't seem possible with the defined constructors
lift :: m a -> Unordered m ? ?
runUnordered :: Unordered m () b -> m b
instance Category (Unordered m) where
id = Arr id
(.) = ?
It’s not at all clear to me why you want arrows for this.
It seems to me that what you want is a data type that can represent either that parsing has resolved and you have the desired value, or that parsing is pending and what you have is a function for getting the desired value plus the parsers that would be used to generate the input to that function. Parsing would then be a matter of alternately trying a parser or resolving ‘the rest’ first, where resolving ‘the rest’ involves returning a value of this data type no longer containing any parsers that ate state.
Something like this might work for you:
{-# LANGUAGE BlockArguments, LambdaCase #-}
-- You could write this GADT-style, but I'm using the other notation to
-- make the point that you don't need full GADTs, only an existential.
data Unordered f result
= forall a. Pending (Unordered f (a -> result)) (f a)
| Resolved result
deriving instance Functor (Unordered f)
runUnordered :: forall f a. (Monad f, Alternative f) => Unordered f a -> f a
runUnordered = go >=> \case Resolved a -> pure a; _ -> empty
where
go :: Unordered f b -> f (Unordered f b)
go = \case
Pending inner p -> do
inner' <- go inner
optional p >>= \case
-- If p succeeds, retry everything still pending in inner'
Just a -> go $ fmap ($ a) inner'
-- If p fails, pass it to caller to be retried later
Nothing -> pure $ Pending inner' p
other -> pure other
data Record = Record Int String
parseRecord :: Parser Record
parseRecord = runUnordered $
Resolved Record `Pending` parseInt `Pending` parseString
It so happens I wrote something similar for a coroutine implementation a few days ago. Adapted, it becomes:
newtype Unordered f a = Un{ ordered :: Ap f a }
deriving (Functor, Applicative)
un :: Applicative f => f a -> Unordered f a
un = Un . liftAp
runUnordered :: (Alternative f, Monad f) => Unordered f a -> f a
runUnordered Un{ordered} = case ordered of
Pure y -> pure y
other -> witherAp' other >>= runUnordered . Un
witherAp :: Alternative f => Ap f a -> f (Ap f a)
witherAp = \case
Pure y -> pure (Pure y)
Ap fx afxy -> prepend <$> optional fx <*> witherAp afxy
where
prepend = \case
Nothing -> Ap fx
Just x -> fmap ($ x)
-- Not needed for coroutines.
-- Added so `runUnordered` will fail when it can't make progress.
witherAp' :: (Alternative f, Monad f) => Ap f a -> f (Ap f a)
witherAp' afx = do
afx' <- witherAp afx
guard (apLength afx' < apLength afx) $> afx'
where
apLength :: Ap f a -> Int
apLength = \case
Pure _ -> 0
Ap _ ap -> 1 + apLength ap
(untested)
It’s true that you might want to guard the implementation behind an opaque newtype, but I think the Applicative instance is nice to have, interface-wise.
I didn’t really use a ‘concept’; I didn’t know ahead of time that I was going to end up with a structure that was equivalent to the free Applicative. I started with what you wanted to achieve—having a list of things to do in any order, and wanting to remove anything that succeeded from that list—and from there worked out what data structure I’d need to represent a computation like that.
Would Either (NonEmpty todo) result work as a type? No, because the things to be done have different types. But they aren’t arbitrary types, so we don’t want to erase the type information entirely in the list. The types of the work to do are related to each other by the type of the constructor function they will be feeding. How to encode that? Well, the type of the constructor function is a nested function type a -> (b -> (c -> ...)), and we want to carry an f a, f b, f c, etc. Ah, there’s a recursive relationship here: if it’s valid to relate a list f a, f b with a function a -> (b -> result0), then it’s valid to relate a list f a, f b, f c with a function a -> (b -> (c -> result1)), making result0 ~ c -> result1. And the base case is the final result of the function, related to no fs to be done at all. We can build a data structure out of those two rules. The base case is Resolved result, and the recursive case wants something that holds a -> result, all the hidden fs that the previous case needs, and f a. The first two inputs are covered by the data type we’re creating, so we have Pending :: forall a. Unordered f (a -> result) -> f a -> Unordered f result. And that’s the definition of Unordered.
From there, you just have to follow your common sense to implementing a runUnordered that does what you want.
We should try left things before right things, so the order of go inner and p is determined.
go can’t fail—it always has the option of returning its argument if nothing can be done. p however can, so it should be wrapped in an optional.
The only moderately interesting bits are the two commented lines, and to develop those I started with those thoughts and type-tetrised my way to lines that implement them.
I guess that’s my advice, if advice is what you’re looking for: get a clear idea of how you want your code to work in the abstract, then let that determine your types, instead of imagining some types based on some loose concepts and trying to wedge an implementation into that shape.
If you don’t care that much for the type-safety of the free Applicative (it tells you whether you have provided a parser for every field), another conceptually simpler approach is using commutative monoids and optionally higher-kinded records. Suppose P is your favourite parsing monad. Also suppose that all the fields of Record which do not have a sensible default are lazy.
import Data.Monoid (Endo(...))
updateInt :: Int -> Endo Record
updateInt i = Endo $ \(Record _ s) -> Record i s
updateString :: String -> Endo Record
updateString s = Endo $ \(Record i _) -> Record i s
-- can obtain these automatically with lens 'set' and Template Haskell
Now, using a generic function of type
(Applicative m, Monoid r, Traversable f) => f (m r) -> m r
and appEndo the parsed Endo to an undefined value like Record undefined undefined. Essentially, the unorderedness arises from the fact that the record update functions
\(Record _ s) -> Record i s
\(Record i _) -> Record i s
commute. If you don’t like the undefined and want to make sure no field is forgotten, a HKD representation of partial results comes handy:
newtype ParsedRecord f = ParsedRecord (f Int) (f String)
instance Alternative f => Semigroup (ParsedRecord f) where
ParsedRecord x y <> ParsedRecord x' y' =
ParsedRecord (x <|> x') (y <|> y')
parseIntMaybe :: P (ParsedRecord Maybe)
parseStringMaybe :: P (ParsedRecord Maybe)
fieldParsers :: ParsedRecord P -- makes sure every field has a parser
fieldParsers = ParseRecord parseInt parseString
complete :: Applicative f => ParsedRecord f -> f Record
complete (ParsedRecord x y) = Record <$> x <*> y
Remark: One could even encapsulate this process of field parser combining in this neat function signature:
Since the parsed Endo functions never overwrite the fields that the other parsers are responsible for, I don’t see how we’re throwing away completed work. The disadvantage over your approach is that the Endo Record type carries no guarantee that you get a complete record out of a successful parse, since you could have omitted a field parser. This defect is only avoided by moving to the HKD variant, which is arguably of similar complexity as free Applicatives, plus requires an extra data declaration.
Oh, now I see: The Endo does not guard you against using the same parser twice, thus overwriting an already parsed field with a new value. The behaviour in presence of multiple field values in the parsed text was not specified, was it?
So what is the record that is parsed from "123\n234\nabc"? Should this fail? Is it Record 234 "abc" or Record 123 "234"?
The HKD approach you envision here has been implemented by
Depending on the parser, the successfully completed work from the fields is not thrown away because the entire record of results is memoized at every input location. This ends up equivalent to the Earley parsing algorithm.