I have previously voiced my dissatisfaction with the lack of both radix trees and lazy data structures, so after a half a year of figuring radix trees out I’m proud to present this monstrosity of a package:
Feedback is appreciated.
I have previously voiced my dissatisfaction with the lack of both radix trees and lazy data structures, so after a half a year of figuring radix trees out I’m proud to present this monstrosity of a package:
Feedback is appreciated.
Congrats! The amount of optimization and care you put into this is out of this world
Cool! Is it like a specialized “trie” data structure, for ByteString key types?
Also I think there could be a natural Semigroup and Monoid instances in this implementation (using empty and union).
Not for StrictByteString
s, but for anything that can be split into continuous memory chunks. StrictText
is too a continuous memory allocation internally, so it and StrictByteString
should perform about the same.
Two reasons why I’m against this:
foldMap (\(k, a) -> insert k a empty)
looks the same as foldr (uncurry insert) empty
, but they’re quite different asymptotically (O(n⋅k²)
vs O(n⋅k)
in this case). A person unfamiliar with the internals wouldn’t know this.
The Semigroup
instance would have to be biased in some way (containers
uses left-biased unions), and that choice would need to be documented. Ambiguous instances are an easy way to sneak hard-to-detect runtime errors into code, so I’m heavily against them.
Or you could use the same convention containers
uses, which is about as standard as anything in Haskell gets.
There are plenty of library approaches in the language I disagree with that can be considered standard, I don’t see this as a good argument. I can even argue in the opposite direction: it costs me nothing to add garbage to the library, but removing it from any broadly used library takes years (if indeed it ever happens).
Ultimately I expect library users to meet me in the middle and to glue together whatever they need themselves, instead of providing conveniences that both confuse people and make the library harder to maintain.
For the bias, can’t you have Semigroup values and <>
on collision? The containers
Map
semigroup is a blunder - an artifact of history. From scratch, there’s no good reason to not do recursive semigroup union (you can recover bias with First and Last, proving it’s strictly more general.)
The fact that foldMap
with singletons is asymptotically worse than a fold of inserts is a compelling reason to not bother with the instance, for sure.
Map
union semigroup is also less efficient than folding and inserting, but not asymptotically iiuc. So different situations.
Now that is a proper instance, yielding
instance Semigroup m => Semigroup (RadixTree m) where
(<>) = unionWith (<>)
instance Semigroup m => Monoid (RadixTree m) where
mempty = empty
Curiously enough this suffers from the same problems as Functor
in strict modules, as the instance function does not specify whether the resulting values are evaluated to WHNF. Since I mark all such functions with a single quote, this would have to be a lazy unionWith
, which the library doesn’t even provide currently.
Combined with the fact that I don’t expect this instance to be actively used, I still lean against adding it.
The package made it to Hackage unaltered, happy bashing.
The overview of the modules in your initial post on the linked PR is great! It tells me what I’m looking at, without having to know what patricia trees or zebra trees are (or looking at the module reference documentation and inferring things from the API). Please add that overview to the package description or readme on hackage!
EDIT: it’s especially useful because even after scanning the API of the patricia trees, I was still confused how this was different from IntMap
. That’s an important question that potential users will have.
To be frank, looking at the list of modules at radix-tree: Radix trees, I feel quite overwhelmed. Which one am I supposed to use by default? What’s the difference? How does it compare to Map
/ HashMap
? What is Zebra
about? It would be great for README
to answer such questions.
Looks like a great library, great work.
I’m happy to rename radixtree
which is a library for using compact prefix trees to do parsing, to something more appropriate - maybe parsers-radixtree
? It’s probably going to be confusing for people to have both radix-tree
and radixtree
. But I have no idea if this is possible at all, as packages cannot be deleted, only deprecated.
OTOH if radix-tree
got some compact radix trees in that would make radixtree
almost redundant. That is trees where branchless edges are merged together, so ["apple","application"]
is the tree [("appl", [("e",accept "apple"), ("ication", accept "application")])]
if that makes sense.
Indeed, but unfortunately there’s an art to this, so I’ll have to spend a few hours on it.
My goal with this library is parsing as well, but my goals are far wider:
Since the default strategy in parsing is evaluating the radix tree at runtime, spine-lazy trees can be used to skip evaluation of the spine on unused optional keys;
If all the keys are known at compilation time, you can precompile a spine-strict tree (hence why I added the .TH
modules to each data structure). If only some are known, you can precompile a spine-lazy tree and only add the unknowns at runtime.
This library is proof that both of these goals are achievable in Haskell. However to utilize this power properly we’ll also need better parsing libraries (see my quirky prototype of json) and lazier underlying parsers (see the incomplete PR in binary).
The issue here is that Haskell data structures are persistent, so any “compacting” you do will directly result in more copying on operations that are now more complex, and therefore higher space usage if you’re sharing a lot of nodes. If memory is a massive concern for your use-case then you should look into an [ephemeral] adaptive radix tree implementation in C/C++/Rust, and then deal with all the restrictions that come from using that.
A possible middle ground here would be making a specialized compressed radix tree that only supports lookups and using that for the precompilated scenario. (though I imagine the actual performance gains from that are going to be meager)
@BurningWitness thanks for updating README
, much clearer now.
I whipped up a quick benchmark against the previous version of the package, but the results are somewhat off. Am I missing some important pragmas / patterns? The source code is at GitHub - Bodigrim/radix-tree at benchmarks. We take a listing of GHC sources (~5000 files), create a radix tree and then lookup every key in it. On my machine numbers are
Old:
603 μs ± 28 μs
New:
2.31 ms ± 221 μs
It feels expensive to lookup for Feed
, which presumably emits byte by byte, but I cannot find any other way to do it.
See the “Inlining” section at the top of the module.
Hence you’re looking for
foldl'
(\acc k -> acc + New.find 0 (New.feedShortByteString k) newRadixTree)
0
keys
Thanks, that’s better, but still slower than the previous implementation:
Old: OK
602 μs ± 54 μs
New: OK
1.27 ms ± 108 μs
Okay, I forgot to put strictness annotations (three bang patterns in each function), so the changes on my machine look like
Old/fromList: OK
5.54 ms ± 226 μs -> 5.58 ms ± 228 μs
New/fromList: OK
4.93 ms ± 224 μs -> 3.27 ms ± 290 μs
Old/lookups: OK
869 μs ± 61 μs -> 868 μs ± 49 μs
New/lookups: OK
2.31 ms ± 208 μs -> 693 μs ± 43 μs
I’ll apply this to all relevant functions and ship later today.
See the tighter-loops
branch for the updated version.
On my machine the tighter-loops
branch runs just a bit slower than the old implementation:
Old: OK
600 μs ± 7.2 μs
New: OK
651 μs ± 9.5 μs
Now if I make common prefixes 10x longer, the difference gets more pronounced:
Old: OK
1.99 ms ± 29 μs
New: OK
3.12 ms ± 23 μs
I think it is very challenging to beat memcmp
with a bytewise loop. Would it be possible to provide
lookupBA :: ByteArray -> RadixTree a -> Maybe a
? Or, if it fits better with the rest of the API, lookupBuild :: Build -> RadixTree a -> Maybe a
?..
If you feel the need to squeeze out every last drop of performance, consider importing the relevant .Unsafe
module and building a function yourself. The PVP applies to .Unsafe
modules, since I never claimed otherwise.
I’m against adding any functions that duplicate existing functionality, so “30% faster in 30% of the cases” is not a compelling reason in my view.
Also consider the fact that you can gain performance in other places: you can precompile trees, perform chunked lookups and even use lazy trees. Unless you have an actual industrial use-case, I doubt this will be a bottleneck.