I’m using import Data.Map.Strict as M and then doing M.fromList xs where xs is a list of 15000 items of the type [((b, b, b), (b, b, c))] where b is an Int and c is a Char. The list is fully materialised when M.fromList is called, yet it takes about 30 seconds to create the map! I verified this by creating the list without the map and checking its length (it takes about 1 second).
Are you sure the list is fully evaluated before converting to a map? Testing the length of the list is not always enough. It’s better to use force or rnf from the deepseq package.
Ok, I think I know what it may be. I think these shallow queries (last or length) do not require evaluation of nested structure, and so pass quickly. Meanwhile there is a O(log(N)) operation which ends up executed per item only when M.fromList hits.
Perhaps, but then the question won’t be about dictionaries, it will be about constructing the intermediate data structure (currently list) in such a way that it doesn’t take 30 seconds to evaluate.
Why does it take 30 seconds to create the map, while creating the list only takes 1 second?
I’m guessing that these measurements may have been due to wrong assumptions. I’m wondering if the list is really fully evaluated in that 1 second or if that is only the time it takes to construct the spine of the list.
One way to test that is to compare the running time of these two main functions:
Also keep in mind that creation of a Data.Map from an unordered list is at least as expensive as ordering the list, while comparisons on 3-tuples itself is not terribly efficient. Building such a map also requires re-arranging the internal tree, which entails garbage collections. There are other container types that are more efficient. Data.Map should only be used if having the keys sorted is a must, e.g. for min/max queries, splitting the map or traversal in order.