Benchmarks of various trie implementations

While studying various approaches to prefix trees (“tries”), I wrote a small memory and time benchmark of four implementations. Long story short, generic-trie seems to be the best choice, at least for a lookup - fromList pair, but I was curious to see how very diverse implementation techniques, notably one based on recursion schemes and another using an arrow type internally, lead to different space and time scaling behaviours.

I would love to hear any feedback regarding for example the use of randomized inputs (trie-perf/Time.hs at master · ocramz/trie-perf · GitHub), and any other improvements.

7 Likes

Small note: You use discreteUniform letters for data-generation. Have you considered other distributions? They could impact runtime a LOT - depending on the algorithm used to insert/lookup.
For most applications the distribution is closely related to https://en.wikipedia.org/wiki/Zipf's_law … i.e. in natural languages (word-frequency, letter-frequency), informatics/statistics/banking (i.e. distribution of digits in an id - in any base!; distribution of the first/last/any digit of wire-transfers, etc.)

Could you also add HashMaps? I made some experiments a week ago (between HashMap Text Text from unordered-containers and Data.Trie.Text and noticed no difference in performance in my application which just uses this as a big dictionary).

Is there also some implementation of a trie in terms of a finger-tree-like structure with laziness and all? I could imagine that such a thing might exist and offer better armortizes access to the “edges” yielding better performance for left/right-biased data.

1 Like

For most applications the distribution is closely related to https://en.wikipedia.org/wiki/Zipf’s_law … i.e. in natural languages (word-frequency, letter-frequency), informatics/statistics/banking (i.e. distribution of digits in an id - in any base!; distribution of the first/last/any digit of wire-transfers, etc.)

This was initially built to test raw performance, not related to any application for now.

Is there also some implementation of a trie in terms of a finger-tree-like structure with laziness and all? I could imagine that such a thing might exist and offer better armortizes access to the “edges” yielding better performance for left/right-biased data.

PRs open! :wink:

1 Like