Why Doesn't Haskell Have a 'Serde'-Like Library for Multi-Format Serialization?

Hello Haskell developers,

Coming from a Rust background, I’ve been wondering why Haskell doesn’t seem to have a library similar to Rust’s Serde, which provides automatic serialization and deserialization to and from multiple data formats such as JSON, CBOR, and XML.

In Haskell, serialization seems fragmented across various libraries:

  • aeson handles JSON,

  • cborg handles CBOR,

  • Other formats like XML their own specialized libraries (if any).

This became more relevant to me after looking into this ( #23989: Encoding intermediate representations as S expressions · Issues · Glasgow Haskell Compiler / GHC · GitLab ) where GHC developers express a need for S-expression parsing and pretty-printing (what I would call serialization and deserialization) for various internal representations (IRs). They suggest S-expressions for their syntax benefits, but I think it would be great if the same serialization infrastructure could support multiple formats, such as:

  • JSON (widely available across most programming languages ),

  • CBOR (for fast serialization and deserialization),

  • S-expressions (for readability).

In Haskell, the cborg package does have a cborg-serialise sub-package, which might offer a foundation for a more general-purpose serialization framework.

Would it be feasible or desirable to create a Haskell library that abstracts over multiple serialization formats—like Serde does in Rust? Has anything like this been attempted or discussed before?

Looking forward to your thoughts!


8 Likes

Those things don’t seem actually related? Not much reuse under the hood. I guess you could do some intermediate representation that accommodates all three but in each case, you usually want to take advantage of the format instead of use something least common denominator.

Alternatively, Generic is already this IR and all serialization libraries provide obligatory support for sensible Generic serialization.

9 Likes

Well, I guess there are several benefits to having a single library that supports multiple formats. For one, it would likely mean less code overall. You could include support for multiple formats with a smaller overall dependency footprint.

Also, something I remember from when I used Rust is that Serde lets you customize how each format is used. For example, you can do it in a very automatic way: say “I want this struct to serialize to JSON, deserialize from XML, deserialize from CBOR,” etc., and that just works automatically.

But beyond that, you can also customize how individual fields appear in different formats. Like, you can make the age field serialize as "age" or as "time", depending on your needs. And you can control that mapping per format. All that customization requires some implementation effort.

If you want that same level of control across multiple formats in Haskell, you usually have to duplicate the definitions—how a structure maps to a given format, field names, etc. I believe both aeson and cborg let you customize things like field names and structure, but if you want to support both, you end up repeating those mappings. That duplication can get tedious if you’re maintaining multiple format implementations.

I have wondered the same, however, I’m not sure how such a library would look like in terms of usability.

In the serde ecosystem, there’s a core library (serde itself) providing the common data model, the derive macros and the ways to implement the Deserialize/Serialize traits, and then a bunch of libraries for supporting the actual format, like serde_json, serde_yaml (now deprecated, but with alternatives in the works) and so on.

The above might be doable, but consider how a great deal of working with serde is the way you can decorate your types and fields with #[serde(_)] attributes to direct the library’s behaviour, with things such as #[serde(default)] for generating the Default value of a type if it’s missing from the input, and others like rename, flatten, skip, serialize_with and so on.

I don’t know how an equivalent usability could be added to a hypothetical serde-hs. Is Template Haskell this powerful?

In any case I agree, it would be awesome to have this ecosystem in place for Haskell. It would certainly help with adoption!

7 Likes

serde-hs would at least be intriguing! especially if it interops with aeson et all seamlessly

1 Like

The closest thing we have to Rust/C# style attributes is source annotations which you can access from TH.

4 Likes

Popular libraries in several popular languages serialise XML to JSON incorrectly (they produce objects with duplicate keys). There is little benefit in using these libraries : it is fast to write code, however the result is unusable.

Often serialisation to a specific format requires customised tweaks.

On to Haskell :

Aeson is an excellent library however it would be good, if there were more libraries for JSON serialisation.

For XML there are several libraries. Some of them are for parsing only, others are for serialisation to XML only. This choice seems highly advantageous. Writing code to convert between types is easy and fast.

Smaller, focused libraries depend on smaller number of packages.
They also offer wider choice.

3 Likes

From my perspective, there is no single way to parse or emit JSON. That’s why deriving-aeson exists. If parsing cannot be fully shared, then there is not much that’s left. Different formats don’t have the same streaming serialisation/deserialisation properties, so we can’t really mutualise those either.

It looks convincing because Rust has adopted it, but if it’s just a matter of ergonomics for “fast adoption”, then I wonder what the story looks like in the long term.

9 Likes

I’m not familiar enough with serde (nor with Rust in general) to comment on the topic. But I like the idea to use TemplateHaskell with annotations! This would allow for example to override the default options on a specific constructor. Something like this:

{-# ANN Baz (JsonTag "buzz") #-}
data Foo = Bar | Baz | Quux

deriveJSON defaultOptions ''Foo

Unfortunately annotations are not supported on record fields, otherwise they would be even more useful. I wonder how hard it would be to add support for this feature in GHC.

4 Likes

Related to the above comment: why aren’t annotations visible in Generic representations too?

1 Like

I agree, I think the lack of a “multi-format” library is an artifact of the Haskell culture of making types meaningful. Early on as formats proliferated, there were attempts to use a shared AST for various ones, i.e. json piggybacking on xml or yaml, or the latter piggybacking on json, or even later some cbor stuff that used an existing aeson ast for convenience. What people inevitably discovered was that nothing matched up quite right, and either some things had to be parsed into something that was “semantically dubious” or there would be ASTs that couldn’t be emitted without arbitrary decisions in certain formats, or both. Since the rule that bad states should be unrepresentable by types goes pretty deep in Haskell culture, none of these attempts got much traction and over time they tended to be abandoned.

12 Likes

`deriving-aeson` has a similar mechanism for customizing the FromJSON/ToJSON instances with phantom type parameters: deriving-aeson: Type driven generic aeson instance customisation . I think it could be taken as inspiration

oh i should mention we also have a universal data type that can represent what different data structures have in their shapes and be emitted to multiple formats already in haskell, which is datatype-generics! So to get the same sort of convenience, but more safety, our existing libraries are all able to leverage that.

5 Likes

I wonder if even despite having GHC.Generics. A single (de)serialization library with support for multipe formats, would still make each one easier to implement that an standalone one for each

From my personal experience, trying to mutualise underlying implementations and interfaces between widely different formats locks you in. You’d have to also provide extension points for tweaking, and then you end up with the most inane API:

encode :: (Encodable a, Decodable b, DecodeError err) => a → Either err b

and vice-versa.

(Edit: I’d be happy to be proven wrong, but even autocodec makes some decision about how your format must relate to JSON. This gives you json-schema, swagger & openapi, but still.)

4 Likes

I’ve used serde, but as other people mentioned, isn’t it (the core package) really just implementing Generic/GenericSOP, just in a different way? (nb. it has this annoying feature that encode can throw at runtime rather than compile time).

I’d say sum-of-products IS the generic format for all Haskell types, and almost all serialization libraries provide a default implementation for generic types.

4 Likes