(... cont) Data.Set no Functor/Monad instance

This begs the question: What is the structure of a set? In the containers implementation it is a search tree, which can only be preserved using an Ord instance, but that is just an implementation detail.
The mathematical powerset functor P is the monad of complete semilattices, so in that context one would expect fmap to preserve arbitrary set unions:

-- prop> fmap f (union x y) = union (fmap f x) (fmap f y)

unions :: Ord a => Set (Set a) -> Set a
unions = foldl union mempty

-- prop> fmap f (unions sets) = unions (fmap (fmap f) sets)

Which is, by the way, a property that Escardo’s searchable sets possess.

I suppose there does not exist an agreed-upon abstract interface for sets in Haskell or functional programming in general. Which is okay, since Haskell packages cater for the programmer’s needs, not the mathematician’s. The interface exposed by containers is, naturally, influenced by its implementation, hence the lack of Functor instance. Data.Set breaks the requirement stated by @AntC by exposing mapMonotonic and functions like lookupMin without an Ord constraint.

Set theory does not mandate the existence of a member function that can decide membership, while an elementary topos requires such a function for the power set object. But instead of Bool the result can be another type of truth values Omega.

For example, suppose there is a Set type than can define the Cantor set as a subset of the type [Int]. It is the set of all infinite lists containing only 0 and 1. Here membership can only be semi-decided: Let Omega = (). Given any list of integers, return () whenever the list is finite or does contain an integer other than 0 or 1, otherwise diverge.

That’s an interesting question indeed.

Personally, I like to think of data structures in terms of their “morally ideal forms”, and for Set, that is something like “zero or one instances of each distinct element value; order is not preserved”. Structure-preserving, then, would mean that for each input element, the output set contains exactly one output element. So if the input set has 6 elements, the output set should also have 6 elements - they can be in any order, but there must be 6 of them, and each must correspond to one of the 6 input elements. That, at least, is my intuition for what fmap on a Set should do.

But this is only the case if the fmapped function is injective, otherwise we two different input elements might map to the same output element, so the output set would be smaller than the input set, and “structure” would be lost.

Interestingly, though, the usual functor laws do not forbid this - fmap id === id and fmap (g . f) === fmap f . fmap g still hold - but for sets, this isn’t enough to comply with the intuition of “preserving structure” - we can, in fact, do something like fmap (const ()) on any set, and we’ll get a singleton set containing just the unit value for any non-empty input set, regardless of what’s in it, which I would hardly say is “structure preserving” in any useful sense.

Then again, maybe other people have different intuitions about what “structure preserving” is supposed to mean, in which case the Functor instance would be fine.

You contradict yourself. That there is zero or exactly one of each distinct value does not entail that the number of elements must be preserved. Structure preservation should mean that if x is an element of a set S and f(x) = y then there should be exactly one value y in the set fmap f S.

It does not, but that is exactly the problem I am trying to outline here.

My intuition for “structure preserving” in a set is that the number of elements should remain unchanged, just like in other Functor instances (like trees, lists, maybes, etc.). But the properties of a set make this impossible to guarantee, unless we constrain the fmapped function to be injective.

If your intuition for “structure preserving” allows for reducing the number of elements, then the Functor instance would be fine.

It just seems wrong to allow an fmap that can “remove” elements from a set.

Elements are never removed, they are conflated due to the function being non-injective. Suppose the only thing we can do with a set is existential quantification.

newtype Set a = Set {exists :: (a -> Bool) -> Bool}

member :: Eq a => a -> Set a -> Bool
member x s = exists s (x==)

instance Functor Set where
    fmap f s = Set (\p -> exists s (p.f))

Now it is perfectly fine to write fmap (const ()) and obtain a singleton set. Why would you object?

A non-injective function is all right for functor laws. It is the lack of extensionality (a=b → f(a)=f(b)) for Eq that messes with us.

Oh, right. If f :: x -> y and g :: y -> z are extensionally injective functions, but the Eq and Ord instances of y identify some distinguishable elements, then fmap g . fmap f will be observably distinct from fmap (g.f) because the former may conflate elements that g.f distinguishes. That is at least true for Data.Set that relies on Ord. Other implementations that do not rely on Ord may not suffer from this problem, for example my quantifier-based implementation above.

This discussion is lacking the notable mention of the subcategories library. The Constrained and CFunctor classes are defined like so:

class Constrained (f :: Type -> Type) where
  type Dom f (a :: Type) :: Constraint
  -- overwriteable
  type Dom f a = ()

class Constrained f => CFunctor f where
  cmap :: (Dom f a, Dom f b) => (a -> b) -> f a -> f b
  default cmap :: Functor f => (a -> b) -> f a -> f b
  cmap = fmap

  (<$:) :: (Dom f a, Dom f b) => a -> f b -> f a
  (<$:) = cmap . const

There is an instance CFunctor Set because Set is Constrained so that type Dom Set a = Ord a, and we have cmap = Set.map.

Of course, Ord is not the only constraint over which we can parametrize functors: the library also makes use of Hashable (for HashSets) and Unbox/Storable/Prim for the various Vector flavours.

Thanks mixphix. I think the discussion has anticipated there might be such a thing as a Constrained Functor. (There’ll be people not happy with calling that a “Functor”, see above.) I think there might be a few fishooks with subcategories: see the first post at

Why would CFunctor want to constrain the incoming structure? Especially why would it be exactly the same constraint as applied in building that structure? I see the docos say

Set.map doesn’t require a to be Ord -instance and therefore the implementation of cmap discards the dictionary for Ord a to call Set.map .

That’s not accurate, is it? cmap for Set can’t avoid requiring a have an Ord instance (or whatever constraint applies for type constructor Set); because it must require b have an Ord instance; and it can’t ask for one without the other. OK it then discards the a instance. Exactly what Data.Set.map achieves is to not require then discard a useless dictionary. I’d prefer (as my Functorb achieves, though rather clunkily):

    cmap :: (Dom f b) => (a -> b) -> f a -> f b

Looks like I need to recalibrate my intuitions…

You are correct that the domain’s dictionary is irrelevant for the purposes of fmapping a function, in the case of a Set. In other cases, for example Data.Vector.Storable.map, the constraint is required on both the source and target. Not all constraints are structural, and one must have Storable values in a Storable vector in order to map over them in the first place.

Thanks. I tried chasing down whether that map actually needs a constraint on the incoming Vector. But my module-chasing stack overflowed.

That some maps need a constraint on the source doesn’t entail that all maps have to pay the overhead.

In similar vein, using a more 2006-era Haskell

    class FunctorCb f  where                         -- Constrain f b inside the class
      fmapCb :: FunctorConstr f b => (a -> b) -> f a -> f b
--    fmapCb = fmap                                  -- can't make default, because needs Functor f => 
    class FunctorConstr f b  where
      fromListCb :: [b] -> f b 
      fromListCb = undefined
--  -- defaults
    instance {-# OVERLAPPABLE #-} Functor f => FunctorCb f  where
      fmapCb = fmap                                  -- catch-all where fmap is good enough
    instance {-# OVERLAPPABLE #-} FunctorConstr f b  -- then no constraint needed
    
    instance FunctorCb Set  where
      fmapCb f xs = fromListCb (map f $ toList xs)
    instance Ord b => FunctorConstr Set b  where fromListCb = fromList

  • So this might be more acceptable to the Category theorists: it at first looks like the instances depend only on the constructor [**], so there’d be the more familiar hierarchy of constructor classes.
  • It exploits that Functorious is cousin-german to List-like/fmap to map.
  • It could be extended to put a (different) constraint on the argument Functor as well as the result – which might be needed to wrangle from the Functor to a List.
  • OTOH it’s clunky/ugly code: needing two instances for each type. (But not for constructors that are happy with bare fmap.)

[**] Don’t tell anyone, but either mixfix’s Assoc Types approach, or the double-class here could apply different constraints depending on the content’s type as well as constructor’s.

As several pointed out, my wished-for instance Functor Set doesn’t break any of Functor's Laws. Perhaps you want some Laws like:

  • length $ toList (fmap f xs) == length $ fmap f (toList xs); or even
  • toList (fmap f xs) == fmap f (toList xs)

But Functors are not in general Foldable; length is a poor proxy for the structure to be preserved; fmap is not required to deliver the same before/after l-to-r sequence for toList.