Why Data.Set (containers) doesn't have Functor and Monad instance?

I hope that they make the “customary” properties of Eq and Ord proper laws in Haskell2020.

5 Likes

if you know the consequences you can write your own orphan-instances for Functor etc.
but be warned that they may explode in your face if the optimizer looks at them wrong as @f-a has already posted.

But most of the time you want to use mapMonotonic anyway as it is O(n) instead of O(n*log(n)).
Your call - but in any case there should be no “default”, because there is no unique emphasized text"correct" instance.

2 Likes

Often when we are programming we forget about the laws. This example shows why laws are important and one should keep these things in mind while programming. It’s not fmap that makes list functors, it’s the laws that are followed by fmap. Yes, Sets are mappable, flattenable (if that’s a word), but that alone doesn’t make it Functors, Monad. Sets simply does not follow the laws.
This was AWESOME :star_struck:

2 Likes

I would also like to mention that there is a pretty neat solution to this problem described in the paper “The Constrained-Monad Problem” by Sculthorpe, Bracker, Giorgidze, and Gill. They add a wrapper around Set called SetM which also contains a suspended monadic operation:

data SetM a where
  Return :: a                      -> SetM a
  Bind   :: Set x -> (x -> SetM a) -> SetM a

This allows you to write a proper Functor instance:

instance Functor SetM where
  fmap f (Return x) = Return (f x)
  fmap f (Bind x k) = Bind x (fmap f . k)

Note that this avoids the fmap (f . g) = fmap f . fmap g problem, e.g.:

  fmap f . fmap g $ Bind s Return
= fmap f (Bind s (fmap g . Return)
= Bind s (fmap f . Return . g)
= Bind s (Return . f . g)
= Bind s (fmap (f . g) . Return)
= fmap (f . g) (Bind s Return)

You can think of the functions we map over the structure as being buffered in some sense so that they are always run together.

And you can define two conversion functions between the two forms:

liftSet :: Set a -> SetM a
liftSet s = Bind s Return

lowerSet :: Ord a => SetM a -> Set a
lowerSet (Return a) = Set.singleton a
lowerSet (Bind s k) = Set.unions (map (lowerSet . k) (toList s))

An implementation of this technique for Set can be found in the set-monad package. And a generalized version of this technique which let’s you apply it to any type you want is implemented in the constrained-normal package.

Now you might think that it will be a pain to manually lift and lower sets all the time, but you can define this SetM as the public Set and define all the normal set operations on this type, e.g.:

insert :: Ord a => a -> SetM a -> SetM a
insert x = liftSet . Set.insert x . lowerSet

Perhaps the only reason that this technique is not used is that it complicates the underlying representation and maybe it also degrades performance. I would like to know if there are other reasons for not implementing it like this.

Edit: I actually think a problem is that you can’t easily force the evaluation of the monadic operations with the standard seq function, that would require a special function:

normalise :: Ord a => SetM a -> SetM a
normalise = liftSet . lowerSet

That said, you cannot do any monadic operations at all with the current implementation, so is this really a disadvantage?

2 Likes

Not just degrades, the Bind operation degrades performance exponentially, doesn’t it?

The current implementation doesn’t have a bind, so I don’t see how it can degrade that performance.

It “has” (at least in principle) the constrained equivalent, of type Ord b => Set a -> (a -> Set b) -> Set b. The version above has performance equivalent to using the list monad, I believe, which defeats the point of using sets entirely.

I guess you could use bind x k = Set.unions $ Set.map k x, which I guess can be exponentially faster if used several times in a row, but that is not a lawful bind as @f-a shows above.

And you can still manually run normalise inbetween to get the better asymptotics.

1 Like

I guess can be exponentially faster if used several times in a row

Yes, I think so.

but that is not a lawful bind

Perhaps, but personally I’m not too worried about violating laws due to mismatched Eq and Ord instances.

you can still manually run normalise inbetween to get the better asymptotics

Yes, but then just using the list monad is an equivalent solution, isn’t it?

It’s more than just a mismatch between the Eq and Ord instances. You have the same problem with these instances:

instance Eq A where
    (A a) == (A b) = round a == round b

instance Ord A where
    (A a) <= (A b) = A a == A b || a <= b

The problem is that the f and g functions in main break the representation:

A 11  == A 12  =/= g (A 11)  == g (A 12)
A 1.1 == A 1.2 =/= f (A 1.1) == f (A 1.2)

Since these are public functions they should obey the customary property “Substitutivity”: x == y <=> f x == f y. If that was a proper law then this example does not work anymore, but it is a very difficult law to formulate and check in your code, because you have to check every public function.

(Edit: actually in this example the lines are a bit blurred: you could also say that the constructor A is the part of the public interface that breaks the substitutivity law: 1.1 == 1.2 =/= A 1.1 == A 1.2)

Yes, but now you don’t need to keep writing conversion functions everywhere unless you care particularly about performance.

It’s more than just a mismatch between the Eq and Ord instances. You have the same problem with these instances:

SetM has exactly the same problem doesn’t it? It technically ducks it by not having an Eq instance but I don’t think that makes the problem any less bad in practice.

No, SetM buffers the applications of fmap such that they are always run together as I show above:

  fmap f . fmap g $ Bind s Return
= fmap f (Bind s (fmap g . Return)
= Bind s (fmap f . Return . g)
= Bind s (Return . f . g)
= Bind s (fmap (f . g) . Return)
= fmap (f . g) (Bind s Return)

You do get the same behaviour if you insert a normalise inbetween, but that doesn’t violate the functor laws.

1 Like

Actually, @tomjaguarpaw, I think involving lists is not necessary, this works just as well:

lowerSet :: Ord a => SetM a -> Set a
lowerSet (Return a) = Set.singleton a
lowerSet (Bind s k) = Set.unions (Set.map (lowerSet . k) s)

I think the authors choose to use a list because comparing sets is costly and usually slower.

And I think the exponential performance degradation also does not happen because when evaluating multiple successive binds the lowerSet function does use a Set in between each bind: the result of Set.unions.

Here is a monad instance for this SetM by the way:

instance Monad SetM where
  Return x >>= k = k x
  Bind x k >>= k' = Bind x (k >=> k')

As you can see this only involves the wrapper and thunks, no evaluation is happening outside the lowerSet function.

Edit: reasoning about performance is difficult.

I think the exponential performance degradation also does not happen because when evaluating multiple successive binds the lowerSet function does use a Set in between each bind

I haven’t worked through it fully, but it can’t possibly use a Set between each bind. It doesn’t have on Ord instance for the intermediate types. (That’s exactly why I’m skeptical of the whole idea. “Solving” the problem just introduces a whole new problem.)

1 Like

Here are some examples which show the performance differences:

fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n - 2) + fib (n - 1)

test1 :: SetM Integer -> SetM Integer
test1 = fmap fib . fmap (const 10)

test2 :: SetM Integer -> SetM Integer
test2 = fmap fib . normalise . fmap (const 10)

test3 :: Int -> SetM Int
test3 x = do
  x <- 10 <$ fromList [1..x]
  x <- 10 <$ fromList [1..x]
  x <- 10 <$ fromList [1..x]
  fromList [1..x]

test4 :: Int -> SetM Int
test4 x = do
  x <- normalise $ 10 <$ fromList [1..x]
  x <- normalise $ 10 <$ fromList [1..x]
  x <- normalise $ 10 <$ fromList [1..x]
  normalise $ fromList [1..x]

main :: IO ()
main = defaultMain
  [ bench "test1" $ nf (lowerSet . test1) (fromList [1..100])
  , bench "test2" $ nf (lowerSet . test2) (fromList [1..100])
  , bench "test3" $ nf (lowerSet . test3) 10
  , bench "test4" $ nf (lowerSet . test4) 10
  ]

Results:

test1                                    time                 323.1 μs  
test2                                    time                 10.84 μs  
test3                                    time                 1.563 ms  
test4                                    time                 5.944 μs  

Indeed looks exponential.

Edit: That were the times with the Set.map implementation of lowerSet, here are the results by using a list instead:

test1                                    time                 344.8 μs  
test2                                    time                 7.261 μs  
test3                                    time                 969.7 μs  
test4                                    time                 3.808 μs  

You could also use lists directly:

test5 :: Int -> [Int]
test5 x = do
  x <- nubOrd $ 10 <$ [1..x]
  x <- nubOrd $ 10 <$ [1..x]
  x <- nubOrd $ 10 <$ [1..x]
  nubOrd $ [1..x]
test5                                    time                 996.3 ns

I think the reason that this is even faster is because of the special fusion rules for lists.

I wonder if we could get some kind of Mappable typeclass which would parametrize over any arbitrary constraint (typeclass) for its fmap equivalent function (maybe called mmap ?).
I don’t know if it’s possible though, something like :

class Mappable m where
   mmap :: (forall c. (c) => a -> b) -> m a -> m b

Where c would be any arbitrary constraint specified in the instance, something like :

class Mappable Set where
   mmap :: ((Ord b) => a -> b) -> Set a -> Set b

I mean, I’m not particularly dissatisfied with the absence of Functor instance for sets, because I don’t really see Functors as Mappable, but I frequently stumble upon the need for “bring whatever constraint you need”.
I suspect it’s possible through multi-parameter typclasses, by parametrizing on the constraint kind, something like :

class Mappable m (c :: Constraint) where
   mmap :: ((c) => a -> b) -> m a -> m b

but I didn’t try, and the ergonomics would be pretty bad regarding various arities (is there a Void constraint ?, or a list constraint we can turn into a single constraint ?). Besides, I guess I’d expect such a lawless typeclass to exist in base for its combination of ergonomics and performance, which could explain my reluctance to try to materialize the idea into yet another library…

Still, If someone see how to do the thing I’ve tried to sketch here, I’d be interested :slight_smile:

You could do it like this:

{-# LANGUAGE ConstraintKinds, TypeFamilies, FlexibleInstances #-}

import qualified Data.Set as Set
import Data.Set (Set)
import qualified Data.List as List

import Data.Kind

class Mappable m where
  type C m a :: Constraint
  type C m a = () -- no constraint by default
  mmap :: (C m a, C m b) => (a -> b) -> m a -> m b

instance Mappable Set where
  type C Set a = Ord a -- overwrite default with the Ord constraint
  mmap = Set.map

instance Mappable [] where
  mmap = List.map

@rae also did a talk about partial type constructors which is another stab at this problem using constraints (solving the constrained functor problem is actually more of a useful side effect of solving a more general problem in that talk).

I can’t tell you anything about the usability of either approach, I don’t have experience with them.

3 Likes

Thanks that seems like a very nice approach ! The talk is amazing too, I don’t how I missed it :slight_smile:

Interesting! So it seems like whenever we use Eq constraint we can’t implement a proper functor? Do i get it right?

1 Like

Discussion continues here.

fmap is a “proper function” – that is, it is parametric polymorphic. If you want a version with an Eq or Ord or any other constraint, that makes it ad-hoc polymorphic, which is “improper” – in the opinion of the purists here.