Hi everyone,
I was given the honour and the responsibility to maintain the base16, base32 and base64 packages. Don’t hesitate to ping me (@kleidukos) on github if you have outstanding tickets or PRs.
Moreover, I’m personally interested if people come up with ways to accelerate the encoding/decoding through SWAR, SIMD or techniques involving unlifted types. This domain is certainly an area where performance-oriented Haskell should shine. I would be very happy to welcome such contributions.
Cheers,
Hécate
20 Likes
Good luck, have fun, enjoy it and I’m around for any questions regarding historical implementation details or work to-be-done.
9 Likes
Thanks Emily for having maintained those packages with diligence, I will try to follow in your path.
1 Like
There are some great read by Jared Tobin on building fast Base16 encoding and decoding:
3 Likes
But if one does want to wield this particular Ring of Power, it can be done like so (this is basically what Bryan O’Sullivan’s base16-bytestring or Emily Pillmore’s base16 package do, for example):
It’s funny because I wrote both implementations
I wield the ring of power, and I’ve just given one of the rings of men to @Kleidukos.
’m guessing this is probably about as fast as one can make this function via “normal” means. Other techniques that I tried while preparing this article don’t seem to move the needle much on this problem, if at all. I’d be pretty impressed by anything that produced another order-of-magnitude performance boost, though – if anyone can achieve that, I’d love to hear about it!
This is actually the conclusion I came to as well, about 4 years ago when i exhausted the baseN encoding optimizations that Haskell could reasonably accomplish with raw Haskell and C: while on base64 one could feasibly write 32 bit lookup tables, Haskell hits a memory barrier in the FFI load that causes it to load much slower than I’d expect, and 16 bit tables were superior for lookup (you can see the historical PR’s i’ve made to the libraries).
The “order of magnitude” optimizations that push the needle in the right direction for all libraries are called one thing: SIMD. Pipeline the reads and writes because they are embarassingly pipelineable at each step of the encode and decode loops. Otherwise, there is not a useful optimization beyond what i have. Whoever writes the SIMD support gets my gratitude. I haven’t had the time.
Edit: actually that’s incorrect - there was one optimization that i found worthwhile because it could prune branches off the decode: using a newtype to establish the provenance of bytes. If we know the bytes were validated at some point as base(16|32|64) and toss that responsibility to the user, then the library can just assume more structure and prune the validity check at each loop iteration so GHC can hotloop the decode more easily. The fun part is this prunes a significant number of branches as the base encoding increases in the size of the bits.
8 Likes