While studying various approaches to prefix trees (“tries”), I wrote a small memory and time benchmark of four implementations. Long story short, generic-trie
seems to be the best choice, at least for a lookup
- fromList
pair, but I was curious to see how very diverse implementation techniques, notably one based on recursion schemes and another using an arrow type internally, lead to different space and time scaling behaviours.
I would love to hear any feedback regarding for example the use of randomized inputs (trie-perf/Time.hs at master · ocramz/trie-perf · GitHub), and any other improvements.
7 Likes
Small note: You use discreteUniform letters
for data-generation. Have you considered other distributions? They could impact runtime a LOT - depending on the algorithm used to insert/lookup.
For most applications the distribution is closely related to https://en.wikipedia.org/wiki/Zipf's_law … i.e. in natural languages (word-frequency, letter-frequency), informatics/statistics/banking (i.e. distribution of digits in an id - in any base!; distribution of the first/last/any digit of wire-transfers, etc.)
Could you also add HashMaps? I made some experiments a week ago (between HashMap Text Text
from unordered-containers and Data.Trie.Text and noticed no difference in performance in my application which just uses this as a big dictionary).
Is there also some implementation of a trie in terms of a finger-tree-like structure with laziness and all? I could imagine that such a thing might exist and offer better armortizes access to the “edges” yielding better performance for left/right-biased data.
1 Like
For most applications the distribution is closely related to https://en.wikipedia.org/wiki/Zipf’s_law … i.e. in natural languages (word-frequency, letter-frequency), informatics/statistics/banking (i.e. distribution of digits in an id - in any base!; distribution of the first/last/any digit of wire-transfers, etc.)
This was initially built to test raw performance, not related to any application for now.
Is there also some implementation of a trie in terms of a finger-tree-like structure with laziness and all? I could imagine that such a thing might exist and offer better armortizes access to the “edges” yielding better performance for left/right-biased data.
PRs open! 
1 Like