Which high-performance parser library to use?

I’m thinking about building an Aeson competitor for the purpose of creating an easy-to-use and understand library for learning purposes, and one thing I’ve noticed is that Aeson is still using Attoparsec. It seems that these days, Attoparsec has been superseded by Flatparse and Cereal, with the former claiming at least 10x performance advantage over Attoparsec. Amazingly, extrapolating from benchmarks, Flatparse should be able to beat comparable libraries in C and Rust.

What are standard Haskell parser combinator libraries these days?

Flatparse is great. András Kovács is an expert at high performance Haskell. Although I believe it’s error messages are not as good as some other libraries.

I’d say the standard general parsing library is megaparsec. I think that can have decent performance if you use it right. So maybe it is not all that bad for “an easy-to-use and understand library”.

Commonly I use attoparsec as it is very very easy (and I am used to it). I got to admit megaparsec's ability to work with other than bytestring (including custom Token types) is a killer feature. In general I don’t care that much about performance.

I wonder if flatparse will perform 10x on other than microbenchmarks. I guess no, but I hope someone proofs me wrong.

FWIW in terms of raw performance bindings to simdjson should be the fastest: hermes-json: Fast JSON decoding via simdjson C++ bindings

7 Likes

Right, simdjson should lex faster than flatparse, but the key question is probably what and how data structures are allocated and accessed, and in that design space the lexing speed may fade into the background.

1 Like

Staying a bit offtopic, let me put in a word for hw-json which both uses very clever succinct indexing and also simd driven parsing. The API is rough to figure out, It can efficiently work for parsing and traversing files too large to fit into memory all at once, including consuming such things incrementally (e.g. as streams derived from piped output from streaming unix processes which generate gigs of json) hw-json: Memory efficient JSON parser

3 Likes

For the “decode an array of 1 million 3-element arrays of doubles” results, which has a 10x speedup over Aeson, do you think that lexing is the main problem with Aeson, or is the problem with Aeson somewhere else? You mention flatparse, but it’s hard to benchmark when flatparse is a generic parsing framework not focused on JSON. How can we know that keeping track of the JSON parser state with flatparse wouldn’t slow it down too much?

It would be interesting if a benchmark of all three could be constructed. Currently the Aeson test suite doesn’t include flatparse, but includes hermes-json.

1 Like