(... cont) Data.Set no Functor/Monad instance

It’s not that ‘set’ fails to meet the definition of container; it’s that containery things include all functors (IO is State# under the hood, and that link includes a treatment of the state monad, if you are open-minded enough for it) but functors don’t include Set.

You can define and use non-monotonic Set.map all day; it’s a perfectly valid function! It’s just not evidence that Functor needs to be reengineered, because it doesn’t represent the action of a functor on an arrow of its source category.

The mathy sets can be thought of as the archetype of functor, even the archetype of container-like monad. It is the requirement that a set contains each element only once, that imposes the Ord constraint in the world of real programming on us. Because unlike the ZF set theory, Haskell has no intrinsic notion of equality (that can be leveraged operationally, that is).

However, there is a set that is a Haskell functor and even a monad: The searchable sets of Escardo. Every set from Data.Set can be embedded in this type, and it can be shown that whenever the base type x has a lawful Eq instance, then the finite sets are the only searchable ones.

in GHC. There are other ways to describe I/O actions - for example (and from the same author):

More can be found in:

… which enables my Functor over a datatype with context being:

    newtype Ord b => OrdSet b = MkOrdSet {unOrdSet :: (Set b)}
                    deriving (Show)

    mapOrdSet :: (Ord b)  => (a -> b) -> OrdSet a -> OrdSet b      -- H98 needs Ord a from appearance of MkOrdSet
    mapOrdSet f (MkOrdSet s) = MkOrdSet $ mapSet f s          
    

    class Functorb f b  where                     -- f :: Type -> Type
        fmapb :: (a -> b) -> f a -> f b 
    
    instance {-# OVERLAPPABLE #-} Functor f =>  Functorb f b  where           -- with Functor f accepted
        fmapb = fmap
    
    instance Ord b => Functorb OrdSet b  where             -- with Ord b only
        fmapb = mapOrdSet      
    

Am I missing something blatantly obvious here, or has this been discussed and I just scrolled past it?

It should be pretty obvious that Set cannot have a reliably lawful Functor instance as long as we insist that fmap is “structure-preserving”. This holds for Maybe (fmap cannot turn a Just into a Nothing), lists (fmap can never discard list elements or otherwise change the length of the list), and also Map (fmap will never add or remove map keys, only change the values stored at each key); but we cannot make it hold for an arbitrary fmapped over a Set.

Pathological example: fmap (const 1) (Set.fromList [1,2,3]) would, for any reasonable implementation of fmap, return Set.fromList [1]. Structure will be preserved for a lot of possible values of f, but the Functor typeclass does not put any additional constraints on f, so we cannot depend on any such properties of f.

3 Likes

That makes sense, but why should we insist that fmap be “structure-preserving”? In mathematics, the powerset functor, P (a close relative of our Set), is not structure-preserving in this sense. For example for

f : N -> N
f n = 0

We have

P(f) : P(N) -> P(N)

and

P(f)( {1, 2, 3} ) = P( {0} )

(for example) so it’s not “structure preserving”.

This begs the question: What is the structure of a set? In the containers implementation it is a search tree, which can only be preserved using an Ord instance, but that is just an implementation detail.
The mathematical powerset functor P is the monad of complete semilattices, so in that context one would expect fmap to preserve arbitrary set unions:

-- prop> fmap f (union x y) = union (fmap f x) (fmap f y)

unions :: Ord a => Set (Set a) -> Set a
unions = foldl union mempty

-- prop> fmap f (unions sets) = unions (fmap (fmap f) sets)

Which is, by the way, a property that Escardo’s searchable sets possess.

This post was flagged by the community and is temporarily hidden.

I suppose there does not exist an agreed-upon abstract interface for sets in Haskell or functional programming in general. Which is okay, since Haskell packages cater for the programmer’s needs, not the mathematician’s. The interface exposed by containers is, naturally, influenced by its implementation, hence the lack of Functor instance. Data.Set breaks the requirement stated by @AntC by exposing mapMonotonic and functions like lookupMin without an Ord constraint.

Set theory does not mandate the existence of a member function that can decide membership, while an elementary topos requires such a function for the power set object. But instead of Bool the result can be another type of truth values Omega.

For example, suppose there is a Set type than can define the Cantor set as a subset of the type [Int]. It is the set of all infinite lists containing only 0 and 1. Here membership can only be semi-decided: Let Omega = (). Given any list of integers, return () whenever the list is finite or does contain an integer other than 0 or 1, otherwise diverge.

That’s an interesting question indeed.

Personally, I like to think of data structures in terms of their “morally ideal forms”, and for Set, that is something like “zero or one instances of each distinct element value; order is not preserved”. Structure-preserving, then, would mean that for each input element, the output set contains exactly one output element. So if the input set has 6 elements, the output set should also have 6 elements - they can be in any order, but there must be 6 of them, and each must correspond to one of the 6 input elements. That, at least, is my intuition for what fmap on a Set should do.

But this is only the case if the fmapped function is injective, otherwise we two different input elements might map to the same output element, so the output set would be smaller than the input set, and “structure” would be lost.

Interestingly, though, the usual functor laws do not forbid this - fmap id === id and fmap (g . f) === fmap f . fmap g still hold - but for sets, this isn’t enough to comply with the intuition of “preserving structure” - we can, in fact, do something like fmap (const ()) on any set, and we’ll get a singleton set containing just the unit value for any non-empty input set, regardless of what’s in it, which I would hardly say is “structure preserving” in any useful sense.

Then again, maybe other people have different intuitions about what “structure preserving” is supposed to mean, in which case the Functor instance would be fine.

You contradict yourself. That there is zero or exactly one of each distinct value does not entail that the number of elements must be preserved. Structure preservation should mean that if x is an element of a set S and f(x) = y then there should be exactly one value y in the set fmap f S.

It does not, but that is exactly the problem I am trying to outline here.

My intuition for “structure preserving” in a set is that the number of elements should remain unchanged, just like in other Functor instances (like trees, lists, maybes, etc.). But the properties of a set make this impossible to guarantee, unless we constrain the fmapped function to be injective.

If your intuition for “structure preserving” allows for reducing the number of elements, then the Functor instance would be fine.

It just seems wrong to allow an fmap that can “remove” elements from a set.

Elements are never removed, they are conflated due to the function being non-injective. Suppose the only thing we can do with a set is existential quantification.

newtype Set a = Set {exists :: (a -> Bool) -> Bool}

member :: Eq a => a -> Set a -> Bool
member x s = exists s (x==)

instance Functor Set where
    fmap f s = Set (\p -> exists s (p.f))

Now it is perfectly fine to write fmap (const ()) and obtain a singleton set. Why would you object?

A non-injective function is all right for functor laws. It is the lack of extensionality (a=b → f(a)=f(b)) for Eq that messes with us.

Oh, right. If f :: x -> y and g :: y -> z are extensionally injective functions, but the Eq and Ord instances of y identify some distinguishable elements, then fmap g . fmap f will be observably distinct from fmap (g.f) because the former may conflate elements that g.f distinguishes. That is at least true for Data.Set that relies on Ord. Other implementations that do not rely on Ord may not suffer from this problem, for example my quantifier-based implementation above.

This discussion is lacking the notable mention of the subcategories library. The Constrained and CFunctor classes are defined like so:

class Constrained (f :: Type -> Type) where
  type Dom f (a :: Type) :: Constraint
  -- overwriteable
  type Dom f a = ()

class Constrained f => CFunctor f where
  cmap :: (Dom f a, Dom f b) => (a -> b) -> f a -> f b
  default cmap :: Functor f => (a -> b) -> f a -> f b
  cmap = fmap

  (<$:) :: (Dom f a, Dom f b) => a -> f b -> f a
  (<$:) = cmap . const

There is an instance CFunctor Set because Set is Constrained so that type Dom Set a = Ord a, and we have cmap = Set.map.

Of course, Ord is not the only constraint over which we can parametrize functors: the library also makes use of Hashable (for HashSets) and Unbox/Storable/Prim for the various Vector flavours.

Thanks mixphix. I think the discussion has anticipated there might be such a thing as a Constrained Functor. (There’ll be people not happy with calling that a “Functor”, see above.) I think there might be a few fishooks with subcategories: see the first post at

Why would CFunctor want to constrain the incoming structure? Especially why would it be exactly the same constraint as applied in building that structure? I see the docos say

Set.map doesn’t require a to be Ord -instance and therefore the implementation of cmap discards the dictionary for Ord a to call Set.map .

That’s not accurate, is it? cmap for Set can’t avoid requiring a have an Ord instance (or whatever constraint applies for type constructor Set); because it must require b have an Ord instance; and it can’t ask for one without the other. OK it then discards the a instance. Exactly what Data.Set.map achieves is to not require then discard a useless dictionary. I’d prefer (as my Functorb achieves, though rather clunkily):

    cmap :: (Dom f b) => (a -> b) -> f a -> f b

Looks like I need to recalibrate my intuitions…

You are correct that the domain’s dictionary is irrelevant for the purposes of fmapping a function, in the case of a Set. In other cases, for example Data.Vector.Storable.map, the constraint is required on both the source and target. Not all constraints are structural, and one must have Storable values in a Storable vector in order to map over them in the first place.

Thanks. I tried chasing down whether that map actually needs a constraint on the incoming Vector. But my module-chasing stack overflowed.

That some maps need a constraint on the source doesn’t entail that all maps have to pay the overhead.