It have map, singleton and unions seem like should be enough to from a monad.
For efficiency reasons, Set
has an Ord
constraint, but Functor has not!
On top of that, even if the Ord
constraint were not a problem, Set
would still not satisfy
fmap (f . g) == fmap f . fmap g
for some peculiar Eq
instances. E.g.:
import Data.Set as S
newtype A = A Double
deriving Show
instance Eq A where
(A a) == (A b) = round a == round b
instance Ord A where
(A a) <= (A b) = a <= b
prova :: Set A
prova = fromList [A 11, A 12]
main = do
print (S.map f . S.map g $ prova)
print (S.map (f . g) prova)
putStrLn "Ooops!"
where
f (A n) = A (n * 10)
g (A n) = A (n / 10)
if you know the consequences you can write your own orphan-instances for Functor
etc.
but be warned that they may explode in your face if the optimizer looks at them wrong as @f-a has already posted.
But most of the time you want to use mapMonotonic anyway as it is O(n)
instead of O(n*log(n))
.
Your call - but in any case there should be no “default”, because there is no unique emphasized text"correct" instance.
Often when we are programming we forget about the laws. This example shows why laws are important and one should keep these things in mind while programming. It’s not fmap that makes list functors, it’s the laws that are followed by fmap. Yes, Sets are mappable, flattenable (if that’s a word), but that alone doesn’t make it Functors, Monad. Sets simply does not follow the laws.
This was AWESOME
I would also like to mention that there is a pretty neat solution to this problem described in the paper “The Constrained-Monad Problem” by Sculthorpe, Bracker, Giorgidze, and Gill. They add a wrapper around Set
called SetM
which also contains a suspended monadic operation:
data SetM a where
Return :: a -> SetM a
Bind :: Set x -> (x -> SetM a) -> SetM a
This allows you to write a proper Functor
instance:
instance Functor SetM where
fmap f (Return x) = Return (f x)
fmap f (Bind x k) = Bind x (fmap f . k)
Note that this avoids the fmap (f . g) = fmap f . fmap g
problem, e.g.:
fmap f . fmap g $ Bind s Return
= fmap f (Bind s (fmap g . Return)
= Bind s (fmap f . Return . g)
= Bind s (Return . f . g)
= Bind s (fmap (f . g) . Return)
= fmap (f . g) (Bind s Return)
You can think of the functions we map over the structure as being buffered in some sense so that they are always run together.
And you can define two conversion functions between the two forms:
liftSet :: Set a -> SetM a
liftSet s = Bind s Return
lowerSet :: Ord a => SetM a -> Set a
lowerSet (Return a) = Set.singleton a
lowerSet (Bind s k) = Set.unions (map (lowerSet . k) (toList s))
An implementation of this technique for Set
can be found in the set-monad
package. And a generalized version of this technique which let’s you apply it to any type you want is implemented in the constrained-normal
package.
Now you might think that it will be a pain to manually lift and lower sets all the time, but you can define this SetM
as the public Set
and define all the normal set operations on this type, e.g.:
insert :: Ord a => a -> SetM a -> SetM a
insert x = liftSet . Set.insert x . lowerSet
Perhaps the only reason that this technique is not used is that it complicates the underlying representation and maybe it also degrades performance. I would like to know if there are other reasons for not implementing it like this.
Edit: I actually think a problem is that you can’t easily force the evaluation of the monadic operations with the standard seq
function, that would require a special function:
normalise :: Ord a => SetM a -> SetM a
normalise = liftSet . lowerSet
That said, you cannot do any monadic operations at all with the current implementation, so is this really a disadvantage?
Not just degrades, the Bind
operation degrades performance exponentially, doesn’t it?
The current implementation doesn’t have a bind, so I don’t see how it can degrade that performance.
It “has” (at least in principle) the constrained equivalent, of type Ord b => Set a -> (a -> Set b) -> Set b
. The version above has performance equivalent to using the list monad, I believe, which defeats the point of using sets entirely.
I guess you could use bind x k = Set.unions $ Set.map k x
, which I guess can be exponentially faster if used several times in a row, but that is not a lawful bind as @f-a shows above.
And you can still manually run normalise inbetween to get the better asymptotics.
I guess can be exponentially faster if used several times in a row
Yes, I think so.
but that is not a lawful bind
Perhaps, but personally I’m not too worried about violating laws due to mismatched Eq
and Ord
instances.
you can still manually run normalise inbetween to get the better asymptotics
Yes, but then just using the list monad is an equivalent solution, isn’t it?
It’s more than just a mismatch between the Eq
and Ord
instances. You have the same problem with these instances:
instance Eq A where
(A a) == (A b) = round a == round b
instance Ord A where
(A a) <= (A b) = A a == A b || a <= b
The problem is that the f
and g
functions in main
break the representation:
A 11 == A 12 =/= g (A 11) == g (A 12)
A 1.1 == A 1.2 =/= f (A 1.1) == f (A 1.2)
Since these are public functions they should obey the customary property “Substitutivity”: x == y <=> f x == f y
. If that was a proper law then this example does not work anymore, but it is a very difficult law to formulate and check in your code, because you have to check every public function.
(Edit: actually in this example the lines are a bit blurred: you could also say that the constructor A
is the part of the public interface that breaks the substitutivity law: 1.1 == 1.2 =/= A 1.1 == A 1.2
)
Yes, but now you don’t need to keep writing conversion functions everywhere unless you care particularly about performance.
It’s more than just a mismatch between the Eq and Ord instances. You have the same problem with these instances:
SetM
has exactly the same problem doesn’t it? It technically ducks it by not having an Eq
instance but I don’t think that makes the problem any less bad in practice.
No, SetM
buffers the applications of fmap
such that they are always run together as I show above:
fmap f . fmap g $ Bind s Return
= fmap f (Bind s (fmap g . Return)
= Bind s (fmap f . Return . g)
= Bind s (Return . f . g)
= Bind s (fmap (f . g) . Return)
= fmap (f . g) (Bind s Return)
You do get the same behaviour if you insert a normalise
inbetween, but that doesn’t violate the functor laws.
Actually, @tomjaguarpaw, I think involving lists is not necessary, this works just as well:
lowerSet :: Ord a => SetM a -> Set a
lowerSet (Return a) = Set.singleton a
lowerSet (Bind s k) = Set.unions (Set.map (lowerSet . k) s)
I think the authors choose to use a list because comparing sets is costly and usually slower.
And I think the exponential performance degradation also does not happen because when evaluating multiple successive binds the lowerSet
function does use a Set
in between each bind: the result of Set.unions
.
Here is a monad instance for this SetM
by the way:
instance Monad SetM where
Return x >>= k = k x
Bind x k >>= k' = Bind x (k >=> k')
As you can see this only involves the wrapper and thunks, no evaluation is happening outside the lowerSet
function.
Edit: reasoning about performance is difficult.
I think the exponential performance degradation also does not happen because when evaluating multiple successive binds the lowerSet function does use a Set in between each bind
I haven’t worked through it fully, but it can’t possibly use a Set
between each bind. It doesn’t have on Ord
instance for the intermediate types. (That’s exactly why I’m skeptical of the whole idea. “Solving” the problem just introduces a whole new problem.)
Here are some examples which show the performance differences:
fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n - 2) + fib (n - 1)
test1 :: SetM Integer -> SetM Integer
test1 = fmap fib . fmap (const 10)
test2 :: SetM Integer -> SetM Integer
test2 = fmap fib . normalise . fmap (const 10)
test3 :: Int -> SetM Int
test3 x = do
x <- 10 <$ fromList [1..x]
x <- 10 <$ fromList [1..x]
x <- 10 <$ fromList [1..x]
fromList [1..x]
test4 :: Int -> SetM Int
test4 x = do
x <- normalise $ 10 <$ fromList [1..x]
x <- normalise $ 10 <$ fromList [1..x]
x <- normalise $ 10 <$ fromList [1..x]
normalise $ fromList [1..x]
main :: IO ()
main = defaultMain
[ bench "test1" $ nf (lowerSet . test1) (fromList [1..100])
, bench "test2" $ nf (lowerSet . test2) (fromList [1..100])
, bench "test3" $ nf (lowerSet . test3) 10
, bench "test4" $ nf (lowerSet . test4) 10
]
Results:
test1 time 323.1 ÎĽs
test2 time 10.84 ÎĽs
test3 time 1.563 ms
test4 time 5.944 ÎĽs
Indeed looks exponential.
Edit: That were the times with the Set.map
implementation of lowerSet
, here are the results by using a list instead:
test1 time 344.8 ÎĽs
test2 time 7.261 ÎĽs
test3 time 969.7 ÎĽs
test4 time 3.808 ÎĽs
You could also use lists directly:
test5 :: Int -> [Int]
test5 x = do
x <- nubOrd $ 10 <$ [1..x]
x <- nubOrd $ 10 <$ [1..x]
x <- nubOrd $ 10 <$ [1..x]
nubOrd $ [1..x]
test5 time 996.3 ns
I think the reason that this is even faster is because of the special fusion rules for lists.
I wonder if we could get some kind of Mappable typeclass which would parametrize over any arbitrary constraint (typeclass) for its fmap equivalent function (maybe called mmap ?).
I don’t know if it’s possible though, something like :
class Mappable m where
mmap :: (forall c. (c) => a -> b) -> m a -> m b
Where c would be any arbitrary constraint specified in the instance, something like :
class Mappable Set where
mmap :: ((Ord b) => a -> b) -> Set a -> Set b
I mean, I’m not particularly dissatisfied with the absence of Functor instance for sets, because I don’t really see Functors as Mappable, but I frequently stumble upon the need for “bring whatever constraint you need”.
I suspect it’s possible through multi-parameter typclasses, by parametrizing on the constraint kind, something like :
class Mappable m (c :: Constraint) where
mmap :: ((c) => a -> b) -> m a -> m b
but I didn’t try, and the ergonomics would be pretty bad regarding various arities (is there a Void constraint ?, or a list constraint we can turn into a single constraint ?). Besides, I guess I’d expect such a lawless typeclass to exist in base for its combination of ergonomics and performance, which could explain my reluctance to try to materialize the idea into yet another library…
Still, If someone see how to do the thing I’ve tried to sketch here, I’d be interested
You could do it like this:
{-# LANGUAGE ConstraintKinds, TypeFamilies, FlexibleInstances #-}
import qualified Data.Set as Set
import Data.Set (Set)
import qualified Data.List as List
import Data.Kind
class Mappable m where
type C m a :: Constraint
type C m a = () -- no constraint by default
mmap :: (C m a, C m b) => (a -> b) -> m a -> m b
instance Mappable Set where
type C Set a = Ord a -- overwrite default with the Ord constraint
mmap = Set.map
instance Mappable [] where
mmap = List.map
@rae also did a talk about partial type constructors which is another stab at this problem using constraints (solving the constrained functor problem is actually more of a useful side effect of solving a more general problem in that talk).
I can’t tell you anything about the usability of either approach, I don’t have experience with them.
Thanks that seems like a very nice approach ! The talk is amazing too, I don’t how I missed it