Radix-tree-1.0: trees that radix

Bodigrim · April 27, 2024, 8:40pm

Thanks, indeed the exposed bits are enough to implement

findBA :: a -> ByteArray -> New.RadixTree a -> a
findBA d key = \(New.RadixTree maybeX tree) -> case keyLen of
  0 -> fromMaybe d maybeX
  _ -> go 0 tree
  where
    keyLen = sizeofByteArray key
    go !n = \case
      New.Bin p l r -> go n (if indexByteArray key n < p then l else r)
      New.Tip arr@(sizeofByteArray -> arrLen) mx dx
        | arrLen + n <= keyLen
        , EQ <- compareByteArrays key n arr 0 arrLen
        -> if arrLen + n == keyLen then fromMaybe d mx else go (n + arrLen) dx
        | otherwise
        -> d
      New.Nil -> d

The benchmarks for long (tenfold) common prefixes are quite impressive:

  Old:    OK
    1.99 ms ± 109 μs
  New:    OK
    3.03 ms ± 216 μs
  findBA: OK
    980  μs ±  54 μs

However, with shorter common prefixes findBA is measurably slower than other implementations:

  Old:    OK
    600  μs ±  14 μs
  New:    OK
    656  μs ±  15 μs
  findBA: OK
    693  μs ±  26 μs

Upd.: It is possible though to use compareByteArrays only if prefix length is, say, 8 or more, in which case we would reap the best of both worlds.

BurningWitness · April 28, 2024, 8:24am

There definitely exists some optimization route where Feed works over chunks and not individual bytes, perhaps comparing Words instead of Word8s if both buffers allow it, but it would be quite a bit more complex than the current solution.

My general point is that this is already asymptotically optimal (and will get slightly better in the future as it gets tweaked), so I wouldn’t bother with a custom solution unless I know every other part of the system around it is about as well-optimized. The two common examples are unnecessary key type conversions and serializing the tree as a list of key-value pairs: either of these most probably introduces way more overhead than any insertion/lookup routine you can handroll, and solving them is way easier.

Bodigrim · April 28, 2024, 9:59am

If I may, I’d suggest to mention this in documentation for lookup directly. Otherwise plenty of users would expect that converting their keys to Feed upfront is a faster way of doing things. You probably want to emphasize that clients should never hold anything of type Feed but rather stick to ByteString / ByteArray / Text / String or whatever they have and convert it to Feed just in time.

BurningWitness · April 28, 2024, 10:53am

This is not a lookup quirk, it applies equally to all functions that consume Feed. Furthermore it’s not critical that the users get this right, their programs will merely run slower. As such this is currently addressed with the “Inlining” documentation section on top of every relevant module. The section could probably be worded better, I admit that.

It’s similar to spine-laziness in that the mechanics are rather simple and obvious to anyone in the loop, while being quite confusing to anyone outside. I would argue addressing this isn’t even the library’s responsibility: I should just be able to say “this is spine-lazy” and “this is short cut fused”, and point to educational pages that explain these concepts in as much depth as may be needed.

Bodigrim · April 28, 2024, 11:14am

True, but lookup is the most basic one. Users can extrapolate further.

What I can tell you very verily is that I read that section and yet made a mistake. I figured out to benchmark saturated applications to get it inlining, but did not connect dots that feedByteString is the one supposed to fuse.

The thing is that radix-tree is already quite a departure from containers / unordered-containers. If I give it a try, do not get inlining right and results are slow, I would not look twice, I would just stick to unordered-containers. After all, Hackage is full of libraries which are asymptotically optimal, but slow as hell.

That said, I feel that I exhausted the bandwidth of unsolicited advices for today Thanks for your enormous amount of work here; I’m eagerly awaiting for a new release with bangs.

Bodigrim · April 29, 2024, 11:18pm

I’m afraid that this allows for ambiguous reading. To be clear: I did not mean that radix-tree is one of such libraries.

mikeplus64 · May 1, 2024, 2:34am

The use-case of radixtree was for parsing thousands-ish dictionaries of English-language terms where the dictionary wasn’t known at compile-time, being 100% user supplied, but would also only rarely change. So it was fine to have very bad (in fact, much worse than it needs to be) insert – it just had to traverse quickly. It could definitely still somewhat easily be better, like by binary search on “large” nodes. Combined with a packrat parser it worked well for parsing English sentences quick enough for autosuggestions to feel snappy as you typed.

But yeah you could have both fast traversal/minimal memory, and fast insertion, with a mutable API, and different node types for different cases. https://db.in.tum.de/~leis/papers/ART.pdf this seems to be that. I don’t see an intrinsic reason a Haskell impl couldn’t be fast and elegant. Might be a nice project to throw linear types at too

Anyway back to the fancy new rewrite. What are the use-cases you have in mind for taking advantage of laziness? The *.Lazy modules would probably be a great backbone for memoisation a la memoize or MemoTrie for some algorithms.

BurningWitness · May 1, 2024, 9:00am

For parsing an extreme example would be runtime-evaluated command-line options, where there may be 200 accessible options and users never touch more than a dozen at a time. The cost of accessing said dozen and pushing the rest around should be meager relative to constructing the entire tree. This same reasoning applies generally to parsing every format with dictionaries in it. It’s obviously less performant than directly precompiling a strict tree, but still it’s a nice default option that a language should have.

Regarding memoization… I don’t know, probably? I’ve never properly used it, so my understanding of it is blurry at best.

BurningWitness · May 1, 2024, 11:33am

You can definitely do memoization efficiently with spine-lazy PATRICIA trees, that is defining

-- | Forms a tree by applying the given function to every value in the
--   closed interval between 0 and the given key.
fillL :: Word -> (Word -> a) -> Patricia a

And then running something similar to

-- Tree that recursively looks up the number of bits in the number.
let tree =
      fillL 35 $ \i ->
        if i == 0
          then 0 :: Int
          else (if odd i then 1 else 0)
             + Patricia.find (error "impossible") (i `unsafeShiftR` 1) tree

> Patricia.lookup 31 tree
Just 5
> Patricia.lookup 26 tree
Just 3

> :sprint tree
tree =
  Patricia.Bin 32
    (Patricia.Bin 16
       (Patricia.Bin 8
          (Patricia.Bin 4
             (Patricia.Bin 2
                (Patricia.Bin 1  (Patricia.Tip 0  0) (Patricia.Tip 1  1))
                (Patricia.Bin 3  _                   (Patricia.Tip 3  2)))
             (Patricia.Bin 6
                _
                (Patricia.Bin 7  (Patricia.Tip 6  2) (Patricia.Tip 7  3))))
          (Patricia.Bin 12
             _
             (Patricia.Bin 14
                (Patricia.Bin 13 _                   (Patricia.Tip 13 3))
                (Patricia.Bin 15 _                   (Patricia.Tip 15 4)))))
       (Patricia.Bin 24
          _
          (Patricia.Bin 28
             (Patricia.Bin 26
                _
                (Patricia.Bin 27 (Patricia.Tip 26 3) _))
             (Patricia.Bin 30
                _
                (Patricia.Bin 31 _                   (Patricia.Tip 31 5))))))

So you can probably do something similar with a radix tree, though you’ll have to pay more to construct a tree (fillL knows the entire range beforehand).