Benchmarks of various trie implementations

ocramz · May 5, 2019, 8:00am

While studying various approaches to prefix trees (“tries”), I wrote a small memory and time benchmark of four implementations. Long story short, generic-trie seems to be the best choice, at least for a lookup - fromList pair, but I was curious to see how very diverse implementation techniques, notably one based on recursion schemes and another using an arrow type internally, lead to different space and time scaling behaviours.

I would love to hear any feedback regarding for example the use of randomized inputs (trie-perf/Time.hs at master · ocramz/trie-perf · GitHub), and any other improvements.

Drezil · May 6, 2019, 6:28am

Small note: You use discreteUniform letters for data-generation. Have you considered other distributions? They could impact runtime a LOT - depending on the algorithm used to insert/lookup.
For most applications the distribution is closely related to https://en.wikipedia.org/wiki/Zipf's_law … i.e. in natural languages (word-frequency, letter-frequency), informatics/statistics/banking (i.e. distribution of digits in an id - in any base!; distribution of the first/last/any digit of wire-transfers, etc.)

Could you also add HashMaps? I made some experiments a week ago (between HashMap Text Text from unordered-containers and Data.Trie.Text and noticed no difference in performance in my application which just uses this as a big dictionary).

Is there also some implementation of a trie in terms of a finger-tree-like structure with laziness and all? I could imagine that such a thing might exist and offer better armortizes access to the “edges” yielding better performance for left/right-biased data.

ocramz · May 6, 2019, 6:47am

For most applications the distribution is closely related to https://en.wikipedia.org/wiki/Zipf’s_law … i.e. in natural languages (word-frequency, letter-frequency), informatics/statistics/banking (i.e. distribution of digits in an id - in any base!; distribution of the first/last/any digit of wire-transfers, etc.)

This was initially built to test raw performance, not related to any application for now.

Is there also some implementation of a trie in terms of a finger-tree-like structure with laziness and all? I could imagine that such a thing might exist and offer better armortizes access to the “edges” yielding better performance for left/right-biased data.

PRs open!

Topic		Replies	Views
Haskell-perf/sequences shows interesting results on addition of DList, Acc & snoc into comparison	2	332	November 22, 2021
Counting Words, but can we go faster? Show and Tell	30	3558	August 10, 2022
Type application, as well as Proxy-argument, has performance impact Learn	2	552	November 26, 2021
[ANN] vector-hashtables-0.1.0.1 Announcements	4	535	September 10, 2021
Hitting performance issues with text maniplulation Learn	6	555	May 8, 2024

Benchmarks of various trie implementations

Related topics