Parsing Recipe Pattern

Recently I figured out a pattern for quickly creating record parsers that utilises generics and HKTs: Parsing Recipe Pattern. I haven’t seen it before, and I think it might be useful and interesting to people, so I’m sharing it.

TL;DR I didn’t like how Cassava/Aeson forced me to proliferate newtype wrappers for different kinds of date formats. The recipe pattern is about allowing me to specify different date parsing formats for each record type without newtype wrappers and keep generic derivation of record parsers.

3 Likes

This Haskell parsing pattern addresses the newtype proliferation problem that arises with typeclass-based generic parsers like Cassava or Aeson.

Typeclasses are bad for (de-)/serialization because most instances are ambiguous, you are absolutely correct.

Megaparsec is a swiss-army knife. … That’s tedious and boilerplate’y. If our record type’s structure already follows its CSV row structure, then couldn’t we somehow autogenerate a parser?

Okay, let’s zoom out a bit. The two approaches you use as examples are a token parser and automatic deriving. It’s enough to get the point across, but in my opinion it’s myopic.

The reason why automatic deriving is a wanted feature is because it aligns inputs and outputs of both serialization functions symmetrically. You don’t need to test this, it works out of the box, and that’s cool.

However:

  • For datatypes that only need to be encoded one way, it doesn’t do anything. Handrolling a parser would take just as many lines of code and RecordWildCards would tell you if you missed anything;

  • There exists a myriad cases that do not map onto automatic deriving. Fundamentally the problem is that decoding is a collection of choices, while encoding has only one correct solution. You can try to mush the two together (tomland did), but it can’t be applicable to every single sidecase (tomland’s parser is Applicative, so you can’t make choices based on existing entries);

  • In certain cases, even if you can derive, you shouldn’t. One such example is external production APIs, where slightly adjusting the external format shouldn’t require you to realign your Haskell datatypes.

The answer to all of this is not automatic deriving, it’s ergonomic handrolling. Libraries, for whatever reason, completely forgo any due diligence on this and as a result using them in production sucks.

I spent two months writing a JSON parser just to prove to myself that I’m correct on this and I believe the answer is a resounding “yes”. If you’re ever going to write your own parser for any format, please do better than any of the libraries you mentioned in your post.

3 Likes

Some time ago I considered building a library for configuration file parsing based on this idea. In theory one could generate an .ini file parser as well as a command line option parser from the same HKD record of field parsers. However for the particular problem of configuration there is little to be gained here that can not already be done in Dhall.
I expect that in practice one would end up with multiple cook functions to perform the record Parser -> Parser (record f) traversal, for whatever f is suitable for the specific problem domain. Why? because, as noted in the post one always needs some extra “glue” to join the parser fragments.

Formats such as CSV or JSON are straightforward in the sense that once all the field parsers are given, there is pretty much only one choice how to glue them together. That is why type classes such as FromJSON1 can exist. (But straightforwardness falls apart on second look, see Cassava and Aeson.) If one could come up with a good and generic glue data type that can be mixed into the recipe, like

cook :: record Parser -> glue Parser -> Parser (record f)

then I’d be interested.

One could easily solve the field name mismatch using Compose ((,) String) Parser instead of plain Parser, Similar to what the (.:) function in Aeson does. Which goes to show that almost always, a custom cook function for combining the field parsers is needed.

Good ideas tend to get re-invented over and over.

1 Like

I’ve experimented in a similar direction in my by-other-names library, in particular the ByOtherNamesH and ByOtherNamesH.Aeson modules. Although I haven’t used them a lot in practice.

I wanted to achieve two things:

  • Be able to provide aliases to the record’s fields.
  • Start with a “vanilla” record, without having to implement the “fields-wrapped-in-functor” version manually.

Usage looks like this:

data Foo = Foo {aa :: Int, bb :: Bool}
  deriving stock (Read, Show, Eq, Generic)
  deriving (FromJSON, ToJSON) via (JSONRecord "obj" Foo)

instance Aliased JSON Foo where
  aliases =
    aliasListBegin
      . alias @"aa" "aax" (singleSlot fromToJSON)
      . alias @"bb" "bbx" (singleSlot fromToJSON)
      $ aliasListEnd

Where fromToJSON is a function supplied by the user which specifies how to serialize/parse a single field.

1 Like

You mean the generic JSON instance of each field? Well, singleSlot fromToJSON doesn’t really defeat the object there, it’s just that it’s unnecessarily verbose if you always want to use the standard JSON instances (modules ByOtherNames and ByOtherNames.Aeson don’t require singleSlot fromToJSON, unlike the -H versions of the modules).

However, being able to manually specify the serializing/parsing function for a field solves an issue mentioned in the OP:

The big drawback to these high-level libraries is that they are based on type classes. We tie a type to how its parsed. For record types that’s fine but it’s not so much for primitives.

We are polluting core business logic with different day types that only differ in how they are parsed. That’s neither ergonomical nor elegant.

I believe one disadvantage of my ByOtherNamesH solution compared to the OP solution is that I don’t allow modifying a “higher-order” field after construction. They must be given all in one go. In the case of the OP solution, we can take a higher-order record and easily set a field, because, at the end of the day, it’s just another record, not a weird Aliases construct. That might make reuse easier.

1 Like

I believe it’s used to optimize the traversal using an “applicative difference list”: Lysxia - Generic traversals with applicative difference lists

2 Likes

I suck at advertising: construct: Haskell version of the Construct library for easy specification of file formats

3 Likes