Which high-performance parser library to use?

Liamzy · August 5, 2023, 8:36am

I’m thinking about building an Aeson competitor for the purpose of creating an easy-to-use and understand library for learning purposes, and one thing I’ve noticed is that Aeson is still using Attoparsec. It seems that these days, Attoparsec has been superseded by Flatparse and Cereal, with the former claiming at least 10x performance advantage over Attoparsec. Amazingly, extrapolating from benchmarks, Flatparse should be able to beat comparable libraries in C and Rust.

What are standard Haskell parser combinator libraries these days?

jaror · August 5, 2023, 8:55am

Flatparse is great. András Kovács is an expert at high performance Haskell. Although I believe it’s error messages are not as good as some other libraries.

I’d say the standard general parsing library is megaparsec. I think that can have decent performance if you use it right. So maybe it is not all that bad for “an easy-to-use and understand library”.

Lsmor · August 5, 2023, 8:58am

Commonly I use attoparsec as it is very very easy (and I am used to it). I got to admit megaparsec's ability to work with other than bytestring (including custom Token types) is a killer feature. In general I don’t care that much about performance.

I wonder if flatparse will perform 10x on other than microbenchmarks. I guess no, but I hope someone proofs me wrong.

Bodigrim · August 5, 2023, 9:07am

FWIW in terms of raw performance bindings to simdjson should be the fastest: hermes-json: Fast JSON decoding via simdjson C++ bindings

chrisdone · August 5, 2023, 8:39pm

Right, simdjson should lex faster than flatparse, but the key question is probably what and how data structures are allocated and accessed, and in that design space the lexing speed may fade into the background.

sclv · August 5, 2023, 11:42pm

Staying a bit offtopic, let me put in a word for hw-json which both uses very clever succinct indexing and also simd driven parsing. The API is rough to figure out, It can efficiently work for parsing and traversing files too large to fit into memory all at once, including consuming such things incrementally (e.g. as streams derived from piped output from streaming unix processes which generate gigs of json) hw-json: Memory efficient JSON parser

janus · August 6, 2023, 2:55pm

For the “decode an array of 1 million 3-element arrays of doubles” results, which has a 10x speedup over Aeson, do you think that lexing is the main problem with Aeson, or is the problem with Aeson somewhere else? You mention flatparse, but it’s hard to benchmark when flatparse is a generic parsing framework not focused on JSON. How can we know that keeping track of the JSON parser state with flatparse wouldn’t slow it down too much?

It would be interesting if a benchmark of all three could be constructed. Currently the Aeson test suite doesn’t include flatparse, but includes hermes-json.

Topic		Replies	Views
Memory usage for Backtracking in Infinite Stream Parsing Learn	12	1623	August 31, 2020
Could binary be... lazier?	7	603	April 24, 2024
Good library for parsing log files (web servers, php errors ...) Learn	2	456	March 8, 2023
Run RFC 9535 JSONPath queries on Data.Aeson Show and Tell	19	1180	January 15, 2025
Announcement for the Compiler Tooling Task Force Announcements	3	2613	January 14, 2022

Which high-performance parser library to use?

Related topics