Radix-tree-1.0: trees that radix

BurningWitness · April 14, 2024, 7:26pm

I have previously voiced my dissatisfaction with the lack of both radix trees and lazy data structures, so after a half a year of figuring radix trees out I’m proud to present this monstrosity of a package:

github.com/sergv/radix-tree

Overhaul

sergv:master ← BurningWitness:overhaul

opened 06:33PM - 14 Apr 24 UTC

BurningWitness

+32222 -1055

This is a full rewrite of the library, featuring, in order of implementation: … - `Data.Patricia.Word.Strict`: this is the same definitionally as `IntMap`, the only meaningful differences are that it uses `Word`s as keys directly and that the API is way more frugal than in `containers` (also has a few cool functions, like range operations). This and every single other tree in the library shamelessly reuses the findings of the recent https://github.com/haskell/containers/pull/995 PR for performance; - `Data.Patricia.Word.Lazy`: slight variation on `IntMap`, spine-lazy. Almost all the functions are the same as for the strict version though; - `Data.Zebra.Word`: `IntSet`, but instead of storing packed keys it stores space partitions. Technically this is a `2^64`-wide bit array. - `Data.Radix1Tree.Word8.Strict`: a spine-strict radix tree with non-empty byte sequences as keys. The internal implementation is similar to `bytestring-trie`, but without "non-trivial invariants". The tree does not have a concrete key type; instead, there are two types of keys: `Feed` for breaking an input key down into bytes (inlines for performance) and `Build` to reshape stored keys into new formats. Implements all the same functions as `Data.Patricia.Word.Strict` plus some more, with the exception of `Range` ones because those functions are gargantuan for radix trees, so they're both a pain in the butt to write and probably would not inline either way. - `Data.RadixTree.Word8.Strict`: an extension to `Radix1Tree` that allows storing a value at an empty byte sequence. The __entire__ interface is duplicated from the `Radix1Tree`, with the help of inlining. - `Data.RadixTree.Word8.Lazy` and `Data.RadixTree.Word8.Strict`: spine-lazy radix trees. Same stuff as with `Patricia` trees: lots of code duplication, different asymptotics. These can store values at infinitely long keys, provided you never access them, how droll. __All__ of this is tested (for result-correctness, but no introspection testing), __all__ of this is documented, __none__ of this is benchmarked (I don't know what case to target, so I didn't bother). Note that while this theoretically should support GHC 8.4 through 8.10, this is only reachable using a custom build due to the `template-haskell >= 2.17` dependency, and as such that has not been tested.

Feedback is appreciated.

Ambrose · April 15, 2024, 2:07am

Congrats! The amount of optimization and care you put into this is out of this world

ocramz · April 15, 2024, 8:47pm

Cool! Is it like a specialized “trie” data structure, for ByteString key types?

Also I think there could be a natural Semigroup and Monoid instances in this implementation (using empty and union).

BurningWitness · April 16, 2024, 6:51am

Not for StrictByteStrings, but for anything that can be split into continuous memory chunks. StrictText is too a continuous memory allocation internally, so it and StrictByteString should perform about the same.

Two reasons why I’m against this:

foldMap (\(k, a) -> insert k a empty) looks the same as foldr (uncurry insert) empty, but they’re quite different asymptotically (O(n⋅k²) vs O(n⋅k) in this case). A person unfamiliar with the internals wouldn’t know this.
The Semigroup instance would have to be biased in some way (containers uses left-biased unions), and that choice would need to be documented. Ambiguous instances are an easy way to sneak hard-to-detect runtime errors into code, so I’m heavily against them.

ocramz · April 16, 2024, 3:31pm

Or you could use the same convention containers uses, which is about as standard as anything in Haskell gets.

BurningWitness · April 16, 2024, 5:35pm

There are plenty of library approaches in the language I disagree with that can be considered standard, I don’t see this as a good argument. I can even argue in the opposite direction: it costs me nothing to add garbage to the library, but removing it from any broadly used library takes years (if indeed it ever happens).

Ultimately I expect library users to meet me in the middle and to glue together whatever they need themselves, instead of providing conveniences that both confuse people and make the library harder to maintain.

Ambrose · April 17, 2024, 1:51am

For the bias, can’t you have Semigroup values and <> on collision? The containers Map semigroup is a blunder - an artifact of history. From scratch, there’s no good reason to not do recursive semigroup union (you can recover bias with First and Last, proving it’s strictly more general.)

The fact that foldMap with singletons is asymptotically worse than a fold of inserts is a compelling reason to not bother with the instance, for sure.

Map union semigroup is also less efficient than folding and inserting, but not asymptotically iiuc. So different situations.

BurningWitness · April 17, 2024, 7:43am

Now that is a proper instance, yielding

instance Semigroup m => Semigroup (RadixTree m) where
  (<>) = unionWith (<>)

instance Semigroup m => Monoid (RadixTree m) where
  mempty = empty

Curiously enough this suffers from the same problems as Functor in strict modules, as the instance function does not specify whether the resulting values are evaluated to WHNF. Since I mark all such functions with a single quote, this would have to be a lazy unionWith, which the library doesn’t even provide currently.

Combined with the fact that I don’t expect this instance to be actively used, I still lean against adding it.

BurningWitness · April 20, 2024, 1:06pm

The package made it to Hackage unaltered, happy bashing.

tomsmeding · April 20, 2024, 3:56pm

The overview of the modules in your initial post on the linked PR is great! It tells me what I’m looking at, without having to know what patricia trees or zebra trees are (or looking at the module reference documentation and inferring things from the API). Please add that overview to the package description or readme on hackage!

EDIT: it’s especially useful because even after scanning the API of the patricia trees, I was still confused how this was different from IntMap. That’s an important question that potential users will have.

Bodigrim · April 20, 2024, 7:34pm

To be frank, looking at the list of modules at radix-tree: Radix trees., I feel quite overwhelmed. Which one am I supposed to use by default? What’s the difference? How does it compare to Map / HashMap? What is Zebra about? It would be great for README to answer such questions.

mikeplus64 · April 21, 2024, 2:15am

Looks like a great library, great work.

I’m happy to rename radixtree which is a library for using compact prefix trees to do parsing, to something more appropriate - maybe parsers-radixtree? It’s probably going to be confusing for people to have both radix-tree and radixtree. But I have no idea if this is possible at all, as packages cannot be deleted, only deprecated.

OTOH if radix-tree got some compact radix trees in that would make radixtree almost redundant. That is trees where branchless edges are merged together, so ["apple","application"] is the tree [("appl", [("e",accept "apple"), ("ication", accept "application")])] if that makes sense.

BurningWitness · April 21, 2024, 10:03am

Indeed, but unfortunately there’s an art to this, so I’ll have to spend a few hours on it.

My goal with this library is parsing as well, but my goals are far wider:

Since the default strategy in parsing is evaluating the radix tree at runtime, spine-lazy trees can be used to skip evaluation of the spine on unused optional keys;
If all the keys are known at compilation time, you can precompile a spine-strict tree (hence why I added the .TH modules to each data structure). If only some are known, you can precompile a spine-lazy tree and only add the unknowns at runtime.

This library is proof that both of these goals are achievable in Haskell. However to utilize this power properly we’ll also need better parsing libraries (see my quirky prototype of json) and lazier underlying parsers (see the incomplete PR in binary).

The issue here is that Haskell data structures are persistent, so any “compacting” you do will directly result in more copying on operations that are now more complex, and therefore higher space usage if you’re sharing a lot of nodes. If memory is a massive concern for your use-case then you should look into an [ephemeral] adaptive radix tree implementation in C/C++/Rust, and then deal with all the restrictions that come from using that.

A possible middle ground here would be making a specialized compressed radix tree that only supports lookups and using that for the precompilated scenario. (though I imagine the actual performance gains from that are going to be meager)

Bodigrim · April 26, 2024, 11:24pm

@BurningWitness thanks for updating README, much clearer now.

I whipped up a quick benchmark against the previous version of the package, but the results are somewhat off. Am I missing some important pragmas / patterns? The source code is at GitHub - Bodigrim/radix-tree at benchmarks. We take a listing of GHC sources (~5000 files), create a radix tree and then lookup every key in it. On my machine numbers are

Old:
  603  μs ±  28 μs
New:
  2.31 ms ± 221 μs

It feels expensive to lookup for Feed, which presumably emits byte by byte, but I cannot find any other way to do it.

BurningWitness · April 27, 2024, 6:57am

See the “Inlining” section at the top of the module.

Hence you’re looking for

foldl'
  (\acc k -> acc + New.find 0 (New.feedShortByteString k) newRadixTree)
  0
  keys

Bodigrim · April 27, 2024, 7:29am

Thanks, that’s better, but still slower than the previous implementation:

  Old: OK
    602  μs ±  54 μs
  New: OK
    1.27 ms ± 108 μs

BurningWitness · April 27, 2024, 8:31am

Okay, I forgot to put strictness annotations (three bang patterns in each function), so the changes on my machine look like

  Old/fromList: OK
    5.54 ms ± 226 μs   ->   5.58 ms ± 228 μs
  New/fromList: OK
    4.93 ms ± 224 μs   ->   3.27 ms ± 290 μs
  Old/lookups:  OK
    869  μs ±  61 μs   ->   868  μs ±  49 μs
  New/lookups:  OK
    2.31 ms ± 208 μs   ->   693  μs ±  43 μs

I’ll apply this to all relevant functions and ship later today.

BurningWitness · April 27, 2024, 2:12pm

See the tighter-loops branch for the updated version.

Bodigrim · April 27, 2024, 2:43pm

On my machine the tighter-loops branch runs just a bit slower than the old implementation:

  Old: OK
    600  μs ± 7.2 μs
  New: OK
    651  μs ± 9.5 μs

Now if I make common prefixes 10x longer, the difference gets more pronounced:

  Old: OK
    1.99 ms ±  29 μs
  New: OK
    3.12 ms ±  23 μs

I think it is very challenging to beat memcmp with a bytewise loop. Would it be possible to provide
lookupBA :: ByteArray -> RadixTree a -> Maybe a? Or, if it fits better with the rest of the API, lookupBuild :: Build -> RadixTree a -> Maybe a?..

BurningWitness · April 27, 2024, 3:21pm

If you feel the need to squeeze out every last drop of performance, consider importing the relevant .Unsafe module and building a function yourself. The PVP applies to .Unsafe modules, since I never claimed otherwise.

I’m against adding any functions that duplicate existing functionality, so “30% faster in 30% of the cases” is not a compelling reason in my view.

Also consider the fact that you can gain performance in other places: you can precompile trees, perform chunked lookups and even use lazy trees. Unless you have an actual industrial use-case, I doubt this will be a bottleneck.