Assuming the maintenance of base16, base32 and base64

Kleidukos · August 18, 2025, 5:48pm

Hi everyone,

I was given the honour and the responsibility to maintain the base16, base32 and base64 packages. Don’t hesitate to ping me (@kleidukos) on github if you have outstanding tickets or PRs.

Moreover, I’m personally interested if people come up with ways to accelerate the encoding/decoding through SWAR, SIMD or techniques involving unlifted types. This domain is certainly an area where performance-oriented Haskell should shine. I would be very happy to welcome such contributions.

Cheers,
Hécate

emilypi · August 18, 2025, 5:58pm

Good luck, have fun, enjoy it and I’m around for any questions regarding historical implementation details or work to-be-done.

Kleidukos · August 18, 2025, 5:59pm

Thanks Emily for having maintained those packages with diligence, I will try to follow in your path.

bmobn00b · August 18, 2025, 6:03pm

There are some great read by Jared Tobin on building fast Base16 encoding and decoding:

emilypi · August 19, 2025, 4:42am

But if one does want to wield this particular Ring of Power, it can be done like so (this is basically what Bryan O’Sullivan’s base16-bytestring or Emily Pillmore’s base16 package do, for example):

It’s funny because I wrote both implementations I wield the ring of power, and I’ve just given one of the rings of men to @Kleidukos.

’m guessing this is probably about as fast as one can make this function via “normal” means. Other techniques that I tried while preparing this article don’t seem to move the needle much on this problem, if at all. I’d be pretty impressed by anything that produced another order-of-magnitude performance boost, though – if anyone can achieve that, I’d love to hear about it!

This is actually the conclusion I came to as well, about 4 years ago when i exhausted the baseN encoding optimizations that Haskell could reasonably accomplish with raw Haskell and C: while on base64 one could feasibly write 32 bit lookup tables, Haskell hits a memory barrier in the FFI load that causes it to load much slower than I’d expect, and 16 bit tables were superior for lookup (you can see the historical PR’s i’ve made to the libraries).

The “order of magnitude” optimizations that push the needle in the right direction for all libraries are called one thing: SIMD. Pipeline the reads and writes because they are embarassingly pipelineable at each step of the encode and decode loops. Otherwise, there is not a useful optimization beyond what i have. Whoever writes the SIMD support gets my gratitude. I haven’t had the time.

Edit: actually that’s incorrect - there was one optimization that i found worthwhile because it could prune branches off the decode: using a newtype to establish the provenance of bytes. If we know the bytes were validated at some point as base(16|32|64) and toss that responsibility to the user, then the library can just assume more structure and prune the validity check at each loop iteration so GHC can hotloop the decode more easily. The fun part is this prunes a significant number of branches as the base encoding increases in the size of the bits.

Topic		Replies	Views
[ANN] base16-1.0 Announcements	0	520	June 23, 2023
Base proposal around vector-like types Haskell Foundation	85	7509	August 26, 2023
Trouble translating a function Learn	4	490	January 30, 2022
How to cook with SIMD? Learn	8	1007	August 26, 2024
Counting Words, but can we go faster? Show and Tell	30	3583	August 10, 2022

Assuming the maintenance of base16, base32 and base64

Related topics