Map.fromList slow for 6-tuple with 15000 items

emiruz · February 18, 2024, 7:41pm

I’m using import Data.Map.Strict as M and then doing M.fromList xs where xs is a list of 15000 items of the type [((b, b, b), (b, b, c))] where b is an Int and c is a Char. The list is fully materialised when M.fromList is called, yet it takes about 30 seconds to create the map! I verified this by creating the list without the map and checking its length (it takes about 1 second).

Is this expected?

jaror · February 18, 2024, 8:15pm

Are you sure the list is fully evaluated before converting to a map? Testing the length of the list is not always enough. It’s better to use force or rnf from the deepseq package.

emiruz · February 18, 2024, 8:45pm

Is last enough ? If so, yes its the same result as when using length.

tomjaguarpaw · February 18, 2024, 8:53pm

No, last is not enough. You have to force the contents of each tuples element in the list.

emiruz · February 18, 2024, 8:53pm

Ok, I think I know what it may be. I think these shallow queries (last or length) do not require evaluation of nested structure, and so pass quickly. Meanwhile there is a O(log(N)) operation which ends up executed per item only when M.fromList hits.

jaror · February 18, 2024, 9:56pm

If you don’t want to use the deepseq package you can also use this function in your case:

myseq :: [((Int,Int,Int),(Int,Int,Char))] -> a -> a
myseq [] a = a
myseq (((!x1,!x2,!x3),(!y1,!y2,!y3)):xs) a = myseq xs a

BurningWitness · February 19, 2024, 6:00am

Could also build the dictionary while at it:

f :: Ord b => [((b, b, b), (b, b, c))] -> Map (b, b, b) (b, b, c)
f = foldl' (\z (k@(!_, !_, !_), a@(!_, !_, !_)) -> Map.insert k a z) Map.empty

If the list is fed into f and never mentioned again, the garbage collector won’t be obligated to keep the entire list in memory.

tomjaguarpaw · February 19, 2024, 7:31am

But aren’t we trying to distinguish the time take to build the list from the time taken to build the map?

BurningWitness · February 19, 2024, 8:32am

Perhaps, but then the question won’t be about dictionaries, it will be about constructing the intermediate data structure (currently list) in such a way that it doesn’t take 30 seconds to evaluate.

jaror · February 19, 2024, 9:08am

My understanding is that the question is:

Why does it take 30 seconds to create the map, while creating the list only takes 1 second?

I’m guessing that these measurements may have been due to wrong assumptions. I’m wondering if the list is really fully evaluated in that 1 second or if that is only the time it takes to construct the spine of the list.

One way to test that is to compare the running time of these two main functions:

main1 = M.toList (M.fromList xs) `myseq` putStrLn "Done!"

main2 = xs `myseq` putStrLn "Done!"

That’s how I suggest you should use myseq.

olf · February 19, 2024, 9:57am

Also keep in mind that creation of a Data.Map from an unordered list is at least as expensive as ordering the list, while comparisons on 3-tuples itself is not terribly efficient. Building such a map also requires re-arranging the internal tree, which entails garbage collections. There are other container types that are more efficient. Data.Map should only be used if having the keys sorted is a must, e.g. for min/max queries, splitting the map or traversal in order.

emiruz · February 22, 2024, 11:39am

You were quite right. The issue is that the list items had a nested structure and they were only partly evaluated.

Topic		Replies	Views
Iterated updating of an IntMap	8	509	January 27, 2024
nestedMap :: how to implement Learn	2	551	October 22, 2021
String literals and compilation speed Learn	7	578	March 10, 2022
Seqn: A sequence library Show and Tell	8	1046	June 22, 2024
My knight's tour is slow Learn	13	1348	November 19, 2022

Map.fromList slow for 6-tuple with 15000 items

Related topics