Map.fromList slow for 6-tuple with 15000 items

I’m using import Data.Map.Strict as M and then doing M.fromList xs where xs is a list of 15000 items of the type [((b, b, b), (b, b, c))] where b is an Int and c is a Char. The list is fully materialised when M.fromList is called, yet it takes about 30 seconds to create the map! I verified this by creating the list without the map and checking its length (it takes about 1 second).

Is this expected?

Are you sure the list is fully evaluated before converting to a map? Testing the length of the list is not always enough. It’s better to use force or rnf from the deepseq package.

Is last enough ? If so, yes its the same result as when using length.

No, last is not enough. You have to force the contents of each tuples element in the list.

Ok, I think I know what it may be. I think these shallow queries (last or length) do not require evaluation of nested structure, and so pass quickly. Meanwhile there is a O(log(N)) operation which ends up executed per item only when M.fromList hits.

If you don’t want to use the deepseq package you can also use this function in your case:

myseq :: [((Int,Int,Int),(Int,Int,Char))] -> a -> a
myseq [] a = a
myseq (((!x1,!x2,!x3),(!y1,!y2,!y3)):xs) a = myseq xs a
2 Likes

Could also build the dictionary while at it:

f :: Ord b => [((b, b, b), (b, b, c))] -> Map (b, b, b) (b, b, c)
f = foldl' (\z (k@(!_, !_, !_), a@(!_, !_, !_)) -> Map.insert k a z) Map.empty

If the list is fed into f and never mentioned again, the garbage collector won’t be obligated to keep the entire list in memory.

But aren’t we trying to distinguish the time take to build the list from the time taken to build the map?

Perhaps, but then the question won’t be about dictionaries, it will be about constructing the intermediate data structure (currently list) in such a way that it doesn’t take 30 seconds to evaluate.

My understanding is that the question is:

Why does it take 30 seconds to create the map, while creating the list only takes 1 second?

I’m guessing that these measurements may have been due to wrong assumptions. I’m wondering if the list is really fully evaluated in that 1 second or if that is only the time it takes to construct the spine of the list.

One way to test that is to compare the running time of these two main functions:

main1 = M.toList (M.fromList xs) `myseq` putStrLn "Done!"

main2 = xs `myseq` putStrLn "Done!"

That’s how I suggest you should use myseq.

1 Like

Also keep in mind that creation of a Data.Map from an unordered list is at least as expensive as ordering the list, while comparisons on 3-tuples itself is not terribly efficient. Building such a map also requires re-arranging the internal tree, which entails garbage collections. There are other container types that are more efficient. Data.Map should only be used if having the keys sorted is a must, e.g. for min/max queries, splitting the map or traversal in order.

3 Likes

You were quite right. The issue is that the list items had a nested structure and they were only partly evaluated.

3 Likes