(... cont) Data.Set no Functor/Monad instance

Continuing the discussion from Why Data.Set (containers) doesn't have Functor and Monad instance?:

Is that a performance consideration? Why would Functor – imagining we re-engineer Functor to be able to cope with constraints – for an instance with no constraint (say List) be less performant than what we have today? Of course if there is a constraint (say Ord for Set) that would impose an overhead, but I guess no worse than any of the myriads of work-rounds/specialist libraries we now have.

I suspect the real explanations are more like:

  • The abstract computational theory [Moggi] lives in the pure world of unrestricted parametric typing; not the mucky world of programming languages.
  • As at when constructor classes were developed in Haskell [Mark Jones 1994], Haskell’s class/instance/constraint/type mangling abilities were not up to the job. (IOW I think we could do better today – or even as of vintage ~2006.)

And the second Functor law came up in RichardE’s talk, where I think he disposed of the objection convincingly. It’s your Eq instance that’s not lawful. Don’t go blaming Functor.

Sure if your function is monotonic. “The precondition is not checked.” say the docos – just as the lawfulness of your Eq/Ord instances aren’t checked.

Yes it’s a very clear presentation. What worries me about that approach is a) the amount of machinery that’s needed, and machinery all over every type application (see below); b) that the machinery then gets hidden (because otherwise source code would be a forest of @s), so

[a] challenge of our approach is that our elaboration step introduces numerous additional constraints in the types of polymorphic functions. For total type constructors, these additional arguments provide no value, but can still make optimization more difficult. In our current implementation, this causes significant performance loss in some test cases.
[from the … in Practice experience report]

And c) that the approach imposes a bunch of unnecessary constraints:

A bit of History: the type of Data.Set.map

  • ~2002 Data.Set had mapSet :: Ord b => (a -> b) -> Set a -> Set b.
  • ~2006 replaced by Data.Set.map :: (Ord a, Ord b) => (a -> b) -> Set a -> Set b
    (mapSet marked " Obsolete equivalent of map." – but those aren’t equivalent: they’re different signatures.)
  • ~2015 somebody came to their senses Data.Set.map :: Ord b => (a -> b) -> Set a -> Set b

So I say (at least for Functor) that the ‘Partial Type Constructors’ approach would impose a constraint on map's Set a application, where it’s not needed: carrying Ord a => is superfluous because Set a is already built, we’d only throw away the dictionary. We do need the Ord b => in order (hah!) to (re)build Set b with b's ordering.

A bit more history: Datatype contexts

(RichardE mentions positively Haskell Report v1.0 1990, moving along …)

GHC as of 1999 had a sensible interpretation for Contexts – comparable to today’s Data.Set.map. But the H98 report text was deemed ambiguous, and the ‘Haskell committee’ of the time (chiefly Wadler – see next message in the thread) voted down SPJ’s/GHC’s approach.

What if …

    class Functorb (fb :: Type)  where
        fmapb :: fb ~ (f b) => (a -> b) -> f a -> f b 
    
    instance Ord b => Functorb (Set b)  where           -- with Ord b accepted
        fmapb = mapSet                                  -- Data.Set.map post-2015 version

    -- and likewise for Applicative, Monad, and friends
  • Firstly note those class and instance headers are H98 compliant with a single type parameter, so don’t need constructor classes nor MPTCs/FunDeps/type functions.
  • The (~) type equality constraint on fmapb wasn’t a thing until ~2008 (?or earlier, arrived with GADTs/Type Families); the idea is long-standing in type theory.
  • The (f b), lacking a concrete type constructor, needs FlexibleContexts; which has been available since ~1998.
  • data Ord b => Set b = ... (under the GHC pre-1999 rules) wouldn’t ask for an Ord a constraint, but only Ord b – which the instance decl satisfies.

Question to the floor: are there examples where Functor and friends needs constraints (actual methods/overloadings) on both the incoming and outgoing constructed type? (That is, on both f a's a and f b's b.) The constrained-monads paper seems to have one at 1.3 Vectors; OTOH the Vizzotto et al paper it comes from seems to decide they need a more general approach: Arrows rather than Monads.

From a mathematical perspective, I would argue that it has to be if you want to think of Set as a functor. Adding the constraint makes it a functor into Hask from OrdHask, the (pseudo-)category with the types having Ord instances as objects and with order homomorphisms as arrows.

Note that this is precisely the requirement you’d need for fmap to be always able to operate on every element of the set ‘in place’, without recomputing the tree structure. That’s the intuition one should have about functors, I’d argue—when considered as containers, their structure is independent of their contents. Non-monotonic mapping over Set is not that.

I think ‘performance reasons’ enter into it because the mathy way to make Set a general Hask-to-Hask functor would be to use Coyoneda Set, but that can duplicate work relative to using Set directly if some of those maps aren’t injective or if intermediate results could be shared.

I was careful not to claim Functorb is a mathy functor (nor Monadb is a mathy monad).

So a fmap over a Set Int that negates each element is not allowed?

I find that an unhelpful way to explain anything/it’s not how I intuit Functors, and gives no help when we get on to Monad. (What is IO a ‘container’ of?) Haskell’s semantics don’t leave anything ‘in place’ – as if we’re updating in situ: that’s procedural/OOP thinking. Refer to RichardE’s response during the presentation. If you look inside the code for Data.Set.map, it’s actually unloading the Set to a List; mapping over the List; reloading to a fresh Set.

map f = fromList . List.map f . toList

Presumably (given the effort that’s gone into polishing Data.Set) that’s as efficient as anything avoiding “recomputing the tree structure”. Note even map id will do that, and will tell you (morally) s == map id s even though map might have rebalanced the structure.

Heh heh, I’m fairly astonished this works:

    instance {-# OVERLAPPABLE #-} Functor f =>  Functorb (f b)  where           -- with Functor f accepted
        fmapb = fmap

So there’s a hope I could steal do notation/RebindableSyntax for Monadb and keep backwards compatibility for the purists.

Can I ask for a link to the talk (or a transcription/article)? That might not be optimal, but as now extensionality is only “suggested”. Making it a requirement would mean to cast Double out of the Eq class.

Double doesn’t even satisfy reflexivity nor substitutivity, let alone extensionality. I’m not seeing good reasons why it’s in Eq. The core libraries have several less-than-fully Eq classes for comparisons. Making a Set Double would be very hazardous.

This Functorb, and more generally, not relying on constructor classes, has a serious drawback that it forces functions that use it to mention the types that f is applied to (except the first one), leaking implementation details.

‘container’ – including a bunch of definitions applying specifically for computing. I don’t see any that stretch to IO. Of course you can use a word to mean anything you like (Humpty Dumpty); that’s the point at which you’ve ceased to communicate/explain and are talking to only yourself. Please find a definition of ‘container’ that includes the sense you want for IO but fails to include Set.

Oh, while I’m browsing the dictionary ‘set’ " 8. (set theory) A collection of zero or more objects, possibly infinite in size, and disregarding any order or repetition of the objects which may be contained within it." – that would be “contained within” a container, methinks.

…and comments like these:

now have me thinking about:

and:

along with my response to the latter:

Haskell has a serious drawback that it fails to detect numeric overflow or divide by zero or running off the end of a List/partial functions or … or …

That’s why I started at the “the mucky world of programming languages”. The practicality is that programming is mostly about implementation details. Would you rather avoid programs that model sets?

I’m being facetious – just in case that wasn’t clear. It seems to me the purism argument here is trying to both have its cake and eat it. I see no law for Functor that operations must update ‘in place’ (like pigeon-holes in a sorting office?) nor even that it must preserve the count of elements. And yet the criminally unlawful instance Eq Double is to be allowed.

Yes Functorb leaks implementation details. Not as bad as horrors like NaN. That there are so many attempts to provide Functor-like ways to operate on Set, and that’s just the poster-child, tells me constructor classes have a “serious drawback”; and that vaunting the wonders of Monad is a thimblerig.

You know what? I think there’s now enough people dissatisfied with the IEEE 754 standard here to build a time machine to go back to 1985 and tell those who would be writing up that standard what they got wrong

…and there are approximately as many attempts to provide “streaming”-like ways to operate on IO (...). There’s also all the attempts to provide asynchronous ways to operate on I/O in Scala, for example:

  • so it isn’t just Haskell that has to contend with these problems;

  • and I am still wondering why parametric overloading was added to an existing language which already supports function procedure overloading…

It’s not that ‘set’ fails to meet the definition of container; it’s that containery things include all functors (IO is State# under the hood, and that link includes a treatment of the state monad, if you are open-minded enough for it) but functors don’t include Set.

You can define and use non-monotonic Set.map all day; it’s a perfectly valid function! It’s just not evidence that Functor needs to be reengineered, because it doesn’t represent the action of a functor on an arrow of its source category.

The mathy sets can be thought of as the archetype of functor, even the archetype of container-like monad. It is the requirement that a set contains each element only once, that imposes the Ord constraint in the world of real programming on us. Because unlike the ZF set theory, Haskell has no intrinsic notion of equality (that can be leveraged operationally, that is).

However, there is a set that is a Haskell functor and even a monad: The searchable sets of Escardo. Every set from Data.Set can be embedded in this type, and it can be shown that whenever the base type x has a lawful Eq instance, then the finite sets are the only searchable ones.

in GHC. There are other ways to describe I/O actions - for example (and from the same author):

More can be found in:

… which enables my Functor over a datatype with context being:

    newtype Ord b => OrdSet b = MkOrdSet {unOrdSet :: (Set b)}
                    deriving (Show)

    mapOrdSet :: (Ord b)  => (a -> b) -> OrdSet a -> OrdSet b      -- H98 needs Ord a from appearance of MkOrdSet
    mapOrdSet f (MkOrdSet s) = MkOrdSet $ mapSet f s          
    

    class Functorb f b  where                     -- f :: Type -> Type
        fmapb :: (a -> b) -> f a -> f b 
    
    instance {-# OVERLAPPABLE #-} Functor f =>  Functorb f b  where           -- with Functor f accepted
        fmapb = fmap
    
    instance Ord b => Functorb OrdSet b  where             -- with Ord b only
        fmapb = mapOrdSet      
    

Am I missing something blatantly obvious here, or has this been discussed and I just scrolled past it?

It should be pretty obvious that Set cannot have a reliably lawful Functor instance as long as we insist that fmap is “structure-preserving”. This holds for Maybe (fmap cannot turn a Just into a Nothing), lists (fmap can never discard list elements or otherwise change the length of the list), and also Map (fmap will never add or remove map keys, only change the values stored at each key); but we cannot make it hold for an arbitrary fmapped over a Set.

Pathological example: fmap (const 1) (Set.fromList [1,2,3]) would, for any reasonable implementation of fmap, return Set.fromList [1]. Structure will be preserved for a lot of possible values of f, but the Functor typeclass does not put any additional constraints on f, so we cannot depend on any such properties of f.

3 Likes

That makes sense, but why should we insist that fmap be “structure-preserving”? In mathematics, the powerset functor, P (a close relative of our Set), is not structure-preserving in this sense. For example for

f : N -> N
f n = 0

We have

P(f) : P(N) -> P(N)

and

P(f)( {1, 2, 3} ) = P( {0} )

(for example) so it’s not “structure preserving”.