Why was Data.Set re-implemented?

I don’t know if anyone can answer this, since it goes back at least 13 years.

Is there a specific reason why much of Data.Set duplicates much of the implementation of Data.Map, instead of being based on Map e ()? Is it performance? Elegance?

I am currently writing a library based on maps and sets and wondering if there are cons to basing my set implementation on top of map.

3 Likes

There is a small performance difference. Keeping track of the () requires an extra pointer which means it will take in more space. Many of the functions also need to be changed anyway to make the interface nicer.

And now that I’m thinking about it, you could also implement Map in terms of Set:

data KeyValue k v = KeyValue !k v

instance Eq k => Eq (KeyValue k v) where
  KeyValue k1 _ == KeyValue k2 _ = k1 == k2

instance Ord k => Ord (KeyValue k v) where
  compare (KeyValue k1 _) (KeyValue k2 _) = compare k1 k2

type Map k v = Set (KeyValue k v)

I think either way has performance implications, untill something like unpacking polymorphic fields will be implemented.

Edit: I think we can actually already do something like that with backpack, see: unpacked-containers. Now I have to go try this out…

And the implementations of Set and Map are not that large. On the other hand I think a lot of bugs are caused by this kind of code duplication (e.g. this bytestring issue I found).

10 Likes

There is a small performance difference.

Do you know of any study/data that quantifies the effect on the performance? I have been (passively) looking for one for one of my ideas.

Thanks!

Logically, having to store an extra pointer takes some extra space which needs to be allocated and deallocated and it causes more cache-misses. That’s the kind of performance impact I was thinking about.

I think it should be pretty easy to make a set implementation based on the maps in the current containers package and then just run the existing benchmarks for sets to see how it compares to the current implementation of sets. I don’t think anybody has done that yet.

2 Likes