(... cont) Data.Set no Functor/Monad instance

‘container’ – including a bunch of definitions applying specifically for computing. I don’t see any that stretch to IO. Of course you can use a word to mean anything you like (Humpty Dumpty); that’s the point at which you’ve ceased to communicate/explain and are talking to only yourself. Please find a definition of ‘container’ that includes the sense you want for IO but fails to include Set.

Oh, while I’m browsing the dictionary ‘set’ " 8. (set theory) A collection of zero or more objects, possibly infinite in size, and disregarding any order or repetition of the objects which may be contained within it." – that would be “contained within” a container, methinks.

…and comments like these:

now have me thinking about:

and:

along with my response to the latter:

Haskell has a serious drawback that it fails to detect numeric overflow or divide by zero or running off the end of a List/partial functions or … or …

That’s why I started at the “the mucky world of programming languages”. The practicality is that programming is mostly about implementation details. Would you rather avoid programs that model sets?

I’m being facetious – just in case that wasn’t clear. It seems to me the purism argument here is trying to both have its cake and eat it. I see no law for Functor that operations must update ‘in place’ (like pigeon-holes in a sorting office?) nor even that it must preserve the count of elements. And yet the criminally unlawful instance Eq Double is to be allowed.

Yes Functorb leaks implementation details. Not as bad as horrors like NaN. That there are so many attempts to provide Functor-like ways to operate on Set, and that’s just the poster-child, tells me constructor classes have a “serious drawback”; and that vaunting the wonders of Monad is a thimblerig.

You know what? I think there’s now enough people dissatisfied with the IEEE 754 standard here to build a time machine to go back to 1985 and tell those who would be writing up that standard what they got wrong

…and there are approximately as many attempts to provide “streaming”-like ways to operate on IO (...). There’s also all the attempts to provide asynchronous ways to operate on I/O in Scala, for example:

  • so it isn’t just Haskell that has to contend with these problems;

  • and I am still wondering why parametric overloading was added to an existing language which already supports function procedure overloading…

It’s not that ‘set’ fails to meet the definition of container; it’s that containery things include all functors (IO is State# under the hood, and that link includes a treatment of the state monad, if you are open-minded enough for it) but functors don’t include Set.

You can define and use non-monotonic Set.map all day; it’s a perfectly valid function! It’s just not evidence that Functor needs to be reengineered, because it doesn’t represent the action of a functor on an arrow of its source category.

The mathy sets can be thought of as the archetype of functor, even the archetype of container-like monad. It is the requirement that a set contains each element only once, that imposes the Ord constraint in the world of real programming on us. Because unlike the ZF set theory, Haskell has no intrinsic notion of equality (that can be leveraged operationally, that is).

However, there is a set that is a Haskell functor and even a monad: The searchable sets of Escardo. Every set from Data.Set can be embedded in this type, and it can be shown that whenever the base type x has a lawful Eq instance, then the finite sets are the only searchable ones.

in GHC. There are other ways to describe I/O actions - for example (and from the same author):

More can be found in:

… which enables my Functor over a datatype with context being:

    newtype Ord b => OrdSet b = MkOrdSet {unOrdSet :: (Set b)}
                    deriving (Show)

    mapOrdSet :: (Ord b)  => (a -> b) -> OrdSet a -> OrdSet b      -- H98 needs Ord a from appearance of MkOrdSet
    mapOrdSet f (MkOrdSet s) = MkOrdSet $ mapSet f s          
    

    class Functorb f b  where                     -- f :: Type -> Type
        fmapb :: (a -> b) -> f a -> f b 
    
    instance {-# OVERLAPPABLE #-} Functor f =>  Functorb f b  where           -- with Functor f accepted
        fmapb = fmap
    
    instance Ord b => Functorb OrdSet b  where             -- with Ord b only
        fmapb = mapOrdSet      
    

Am I missing something blatantly obvious here, or has this been discussed and I just scrolled past it?

It should be pretty obvious that Set cannot have a reliably lawful Functor instance as long as we insist that fmap is “structure-preserving”. This holds for Maybe (fmap cannot turn a Just into a Nothing), lists (fmap can never discard list elements or otherwise change the length of the list), and also Map (fmap will never add or remove map keys, only change the values stored at each key); but we cannot make it hold for an arbitrary fmapped over a Set.

Pathological example: fmap (const 1) (Set.fromList [1,2,3]) would, for any reasonable implementation of fmap, return Set.fromList [1]. Structure will be preserved for a lot of possible values of f, but the Functor typeclass does not put any additional constraints on f, so we cannot depend on any such properties of f.

3 Likes

That makes sense, but why should we insist that fmap be “structure-preserving”? In mathematics, the powerset functor, P (a close relative of our Set), is not structure-preserving in this sense. For example for

f : N -> N
f n = 0

We have

P(f) : P(N) -> P(N)

and

P(f)( {1, 2, 3} ) = P( {0} )

(for example) so it’s not “structure preserving”.

This begs the question: What is the structure of a set? In the containers implementation it is a search tree, which can only be preserved using an Ord instance, but that is just an implementation detail.
The mathematical powerset functor P is the monad of complete semilattices, so in that context one would expect fmap to preserve arbitrary set unions:

-- prop> fmap f (union x y) = union (fmap f x) (fmap f y)

unions :: Ord a => Set (Set a) -> Set a
unions = foldl union mempty

-- prop> fmap f (unions sets) = unions (fmap (fmap f) sets)

Which is, by the way, a property that Escardo’s searchable sets possess.

I suppose there does not exist an agreed-upon abstract interface for sets in Haskell or functional programming in general. Which is okay, since Haskell packages cater for the programmer’s needs, not the mathematician’s. The interface exposed by containers is, naturally, influenced by its implementation, hence the lack of Functor instance. Data.Set breaks the requirement stated by @AntC by exposing mapMonotonic and functions like lookupMin without an Ord constraint.

Set theory does not mandate the existence of a member function that can decide membership, while an elementary topos requires such a function for the power set object. But instead of Bool the result can be another type of truth values Omega.

For example, suppose there is a Set type than can define the Cantor set as a subset of the type [Int]. It is the set of all infinite lists containing only 0 and 1. Here membership can only be semi-decided: Let Omega = (). Given any list of integers, return () whenever the list is finite or does contain an integer other than 0 or 1, otherwise diverge.

That’s an interesting question indeed.

Personally, I like to think of data structures in terms of their “morally ideal forms”, and for Set, that is something like “zero or one instances of each distinct element value; order is not preserved”. Structure-preserving, then, would mean that for each input element, the output set contains exactly one output element. So if the input set has 6 elements, the output set should also have 6 elements - they can be in any order, but there must be 6 of them, and each must correspond to one of the 6 input elements. That, at least, is my intuition for what fmap on a Set should do.

But this is only the case if the fmapped function is injective, otherwise we two different input elements might map to the same output element, so the output set would be smaller than the input set, and “structure” would be lost.

Interestingly, though, the usual functor laws do not forbid this - fmap id === id and fmap (g . f) === fmap f . fmap g still hold - but for sets, this isn’t enough to comply with the intuition of “preserving structure” - we can, in fact, do something like fmap (const ()) on any set, and we’ll get a singleton set containing just the unit value for any non-empty input set, regardless of what’s in it, which I would hardly say is “structure preserving” in any useful sense.

Then again, maybe other people have different intuitions about what “structure preserving” is supposed to mean, in which case the Functor instance would be fine.

You contradict yourself. That there is zero or exactly one of each distinct value does not entail that the number of elements must be preserved. Structure preservation should mean that if x is an element of a set S and f(x) = y then there should be exactly one value y in the set fmap f S.

It does not, but that is exactly the problem I am trying to outline here.

My intuition for “structure preserving” in a set is that the number of elements should remain unchanged, just like in other Functor instances (like trees, lists, maybes, etc.). But the properties of a set make this impossible to guarantee, unless we constrain the fmapped function to be injective.

If your intuition for “structure preserving” allows for reducing the number of elements, then the Functor instance would be fine.

It just seems wrong to allow an fmap that can “remove” elements from a set.

Elements are never removed, they are conflated due to the function being non-injective. Suppose the only thing we can do with a set is existential quantification.

newtype Set a = Set {exists :: (a -> Bool) -> Bool}

member :: Eq a => a -> Set a -> Bool
member x s = exists s (x==)

instance Functor Set where
    fmap f s = Set (\p -> exists s (p.f))

Now it is perfectly fine to write fmap (const ()) and obtain a singleton set. Why would you object?

A non-injective function is all right for functor laws. It is the lack of extensionality (a=b → f(a)=f(b)) for Eq that messes with us.

Oh, right. If f :: x -> y and g :: y -> z are extensionally injective functions, but the Eq and Ord instances of y identify some distinguishable elements, then fmap g . fmap f will be observably distinct from fmap (g.f) because the former may conflate elements that g.f distinguishes. That is at least true for Data.Set that relies on Ord. Other implementations that do not rely on Ord may not suffer from this problem, for example my quantifier-based implementation above.