Monad instance trouble: Any way to implement unionWith on some kind of map type, without an Ord constraint?

AntC2 · December 9, 2024, 8:10pm

And previously on Discourse: Why Data.Set (containers) doesn’t have Functor and Monad instance?; (… cont) Data.Set no Functor/Monad instance.

TheoH · December 9, 2024, 11:20pm

Thanks. I did look in the more cursory documentation, but “(…)” seems to be unGoogleable.

AntC2 · December 10, 2024, 8:42am

Hehe, typical Oleg: a couple of sentences of description, then screeds of dense code. The critical decl is:

data SetMonad a where
    SMOrd :: Ord a => S.Set a -> SetMonad a
    SMAny :: [a] -> SetMonad a

-- with bind
    m >>= f = collect . map f $ toList m

[Editted] So this is a GADT with the map f producing a List of SetMonads, ~~each~~some hiding an Ord dictionary – exactly the same Ord dictionary, in fact. But the compiler can’t figure that out (because the dictionary can’t get unrolled until we enter each cons node), so walking the tree will suffer the polymorphic dictionary-lookup penalty. Or in the worst case, collect in scanning the List can’t find a dictionary, so uses notoriously poor-performing append to possibly allow duplicates in its result.

The nodes not hiding an Ord dictionary might contain duplicates; they’re lists in not necessarily ascending sequence; so S.unioning them with the nicely-organised nodes will amortise that performance hit across the accesses.

jaror · December 10, 2024, 9:26am

The SetMonad GADT is not nested. It is just a sum of Set and [].

arpl · December 10, 2024, 12:21pm

GHC will point it out to you : ) But it’s 2 dots/periods as per

jmcarthur · December 10, 2024, 12:54pm

The idea that makes it efficient is that although it is possible for fmap to create duplicates, by mapping multiple inputs to the same output, (>>=) will not create duplicates as long as the function argument returns a set with a known Ord, which is not terribly difficult to arrange. The MonadPlus instance Oleg added is unfortunate, though, since it makes it tempting to construct sets by using it with a bunch of returns.

jaror · December 10, 2024, 1:13pm

That’s not @TheoH’s fault. DIscourse renders .. as … unless you put it in a code block.

arpl · December 10, 2024, 2:39pm

Dat is goed om te weten.

AntC2 · December 10, 2024, 10:24pm

Thank you, you’re quite correct. I’ve editted my post. (Note to self: don’t try grokking Oleg-iness just before bedtime.) I still see Bad Things nested. What threw me was:

collect ((SMOrd x):t) = ...
collect ((SMAny x):t) = ...

collect is (trying to) clean up the mess made by the map f; that’s a List of SMOrd/SManys with some in the tail repeating the same Ord dictionary and/or repeating duplicate elements.

AntC2 · December 11, 2024, 12:30am

Hmm? These days, Monad is supposed to have Applicative .. Functor superclasses. (I think that wasn’t the case in 2013.) What would a Functor instance look like? And how would it be lawful?

fmap f xs  =  xs >>= return . f

TheoH · December 11, 2024, 12:53am

I get that an Eq is necessary in general, but in this situation I think I can do without it (also, in suitable circumstances, with anything like what Python calls an OrderedDict). Here’s the code I ended up with, using zipWith. I’m not 100% sure that my the monad laws are obeyed, but it seems to be working.

import Control.Applicative
import Control.Monad (liftM, ap)

newtype Probability = P {unP :: Float}
  deriving (Fractional, Num, Show, Eq, Ord)

newtype Dist a = D {unD :: [(a,Probability)]}
 deriving (Eq, Show)

instance Functor Dist where
  fmap = liftM

instance Applicative Dist where
  pure x = D [(x,P 1.0)]
  (<*>) = ap

instance Monad Dist where
  return      = pure
  d >>= f = dist_join $ dist_fmap f d

onD :: ([(a,Probability)] -> [(a,Probability)]) -> Dist a -> Dist a
onD f  = D . f . unD

onDP :: ([(a,Probability)] -> [(b,Probability)]) -> Dist a -> Dist b
onDP f  = D . f . unD

a = D [("a",1.0)]

normalize dl = map adjust dl
  where
  adjust (first, second) = map (fmap (*second)) $ unD first

zipAndSum dist1 dist2 = zipWith (\(a, prob1) (_, prob2) -> (a, prob1 + prob2)) dist1 dist2
collect xs = foldl (zipAndSum) (head xs) (tail xs)

dist_join :: Dist (Dist a) -> Dist a
dist_join dd = onDP (collect . normalize) dd

dist_fmap :: (a -> Dist b) -> Dist a -> Dist (Dist b)
dist_fmap f (D xs) = D [(f a, p) | (a, p) <- xs]

type Trans a = a -> Dist a

trans :: Trans String
trans "a" = D [("a", P 0.75),("b", P 0.05),("c", P 0.2)]
trans "b" = D [("a", P 0.25),("b", P 0.70),("c", P 0.05)]
trans "c" = D [("a", P 0.25),("b", P 0.70),("c", P 0.05)]

dist_bind t m = dist_join $ dist_fmap t m

fixpoint :: Eq a => (a -> a) -> a -> a
fixpoint f x
  | x' == x = x'
  | otherwise = fixpoint f x'
  where x' = f x

jmcarthur · December 11, 2024, 3:58am

I’m not sure I understand what you’re asking. With the caveat that I already mention about how fmap can cause duplicates, there is no problem implementing it. It’s lawful as long as you limit the ways that these values can be observed. Basically you have to use a function that converts it to a Set in order to observe it.

olf · December 12, 2024, 11:06pm

I used several possible implementations of a finite distribution monad and found no one to be superior to Erwig’s naive definition. The reason is that usage of structures like Map or Seq will waste time on allocations and garbage collection while normalizing, while the list-based distribution just happily (and lazily) concatenates, which does not appear to be a problem if at the end you just fold the distribution.
Of course computing fixed points is another thing. But you may be out of luck here since functions involving Float rarely have fixed points, even if they numerically converge. Same holds true for Maps, although Joachim Breitner has made some amazing progress in this direction.

It seems a maximally free implementation can be leveraged to compute fixed points.

class Monoid r => Cone r where
   scale :: Probability -> r -> r
   -- Monoid to be understood additively

type Dist a = forall r. Cone r => (a -> r) -> r

pure :: a -> Dist a
pure a = \b -> b a

bind :: (a -> Dist b) -> Dist a -> Dist b
bind k m = \b -> m (flip k b)

Now the fixed point of bind k at m is given by

\m -> \b -> m (fix (flip k))

The above is an instance of a more general scheme. Consider this functor:

data DistF x = Mempty | MPlus x x | Score Probability x deriving (Functor)

It embodies the basic operations a cone (as well as distributions, which form a cone) support: the MonadPlus operations (i.e. choice) and scoring. Now any finite distribution can be embedded into this type:

import Control.Monad.Free.Church (F)
type Dist = F DistF

A value of type F DistF a is isomorphic to

forall r. (a -> r) -> (DistF r -> r) -> r

which says: Given the mass of every basic element in the event space, and a way to add and score masses, we can compute the mass of a distribution.

Suppose k :: a -> F f a is the Kleisli map you want to iteratively bind.
Further suppose f :: f r -> r. Then the term

k' = \b. \a -> k a b f

has type (a -> r) -> (a -> r) and it can be shown that the fixed point of iteratively binding k to any m :: F f a is

\b f -> m (fix k') f

Thus we have reduced the problem of computing fix (k =<<) to the problem of computing the fixed point of an element-scoring operator k'.

TheoH · December 16, 2024, 8:25pm

Thanks for this. It’s not clear to me how you go from

bind k m = \b -> m (flip k b)

to

\m -> \b -> m (fix (flip k))

Maybe if you could just spell that out in more detail, or perhaps point out that I am missing something obvious.

TheoH · December 21, 2024, 4:39pm

I have to say I am struggling to implement this and I think it’s all too abstract for me at my current level of knowledge.

Although I’ve explored further with this, the minimal problem is reproduced by this code, which is almost verbatim from the parent post:

type Probability = Float

class Monoid r => Cone r where
   scale :: Probability -> r -> r
   -- Monoid to be understood additively

type Dist a = forall r. Cone r => (a -> r) -> r

pure :: a -> Dist a
pure a = \b -> b a

bind :: (a -> Dist b) -> Dist a -> Dist b
bind k m = \b -> m (flip k b)

This gives the following error:

olf.hs:13:26: error: [GHC-83865]
    • Couldn't match type: forall r1. Cone r1 => (b -> r1) -> r1
                     with: (b -> r) -> r
      Expected: a -> (b -> r) -> r
        Actual: a -> Dist b
    • In the first argument of ‘flip’, namely ‘k’
      In the first argument of ‘m’, namely ‘(flip k b)’
      In the expression: m (flip k b)
    • Relevant bindings include
        b :: b -> r (bound at olf.hs:13:13)
        k :: a -> Dist b (bound at olf.hs:13:6)
        bind :: (a -> Dist b) -> Dist a -> Dist b (bound at olf.hs:13:1)
   |
13 | bind k m = \b -> m (flip k b)
   |                          ^

Maybe “maximally free” translates to “not actually typeable”, I don’t know!

It’s unclear to me how the Dist type is supposed to work, frankly, even the pure function.

As I say, I’ve explored further and keep failing, presumably due my misunderstanding of the model that’s being proposed.

Thanks for any help anyone can offer!

olf · December 21, 2024, 8:00pm

Apologies, you need the RankNTypes language extension, which I neglected to include in the post. Apart from that, there was a change in type inference between GHC 8 and 9 that stops the code from type checking.
You can paste the code into the Haskell Playground and choose the compiler version. It will compile with GHC 8.
To satisfy GHC 9, we need the more involved:

{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}

type Dist a = forall r. Cone r => (a -> r) -> r

bind :: forall a b. (a -> Dist b) -> Dist a -> Dist b
bind k m = let
  flipk :: forall r. Cone r => (b -> r) -> a -> r
  flipk b a = k a b
  in m . flipk

olf · December 21, 2024, 8:33pm

TheoH:

Thanks for this. It’s not clear to me how you go from
bind k m = \b -> m (flip k b)
to
\m -> \b -> m (fix (flip k))
Maybe if you could just spell that out in more detail, or perhaps point out that I am missing something obvious.

Okay, observe that bind k m = m . flip k where

flip k :: forall r. Cone r => (b -> r) -> (a -> r)

Now we specialize b=a since we want k :: a -> Dist a. This makes

flip k :: forall r. Cone r => (a -> r) -> (a -> r)

Now it is not hard to see that

bind k (bind k m) = (m . flip k) . flip k

or in terms of function composition

(bind k . bind k) m = m . (flip k . flip k)

By induction, iterating the bind k operator on any starting distribution m is the same as pre-composing the m functional with iterated compositions of flip k.
We may than ask whether there is a limit to this process. Apparently we need to compute the infinite composition of a function with itself. Suppose f :: x -> x (where in our case, x = (a -> r) but that does not matter for now). We need a function f_iterated with the property that

f_iterated = f . f_iterated

Hence

f_iterated x = f (f_iterated x)

Observe that f_iterated x obeys the same equation as fix f, namely

fix f = f (fix f)

whence we conclude that f_iterated x = fix f and thereby f_iterated = const (fix f).

EDIT: There is a problem with the above calculations. The Riesz embedding of distributions into continuations always produces linear continuations. If r is a cone, then the function space a -> r has (point-wise) cone structure. Thus we can talk about a functional phi :: (a -> r) -> r to preserve cone structure, in the sense that

phi mempty = mempty
phi (f <> g) = phi f <> phi g
phi (scale p f) = scale p (phi f)

But even if flip k is linear by construction, its iterated limit const (fix (flip k)) needs not be linear unless it is a trivial function like constant bottom or constant mempty.
What we are missing here is a proof that the function bind k does possess a sensible fixed point at all. Recall that fix always computes the least fixed point, whereas e.g. bind pure (by monad laws) holds every distribution fixed.

TheoH · December 21, 2024, 8:34pm

OK, Thanks for this.

So, if I want to define an instance of Dist – Dist String, say – I am perplexed that pure is so simple. I would expect any instance of Dist to involve a representation of a distribution, say [(String, Probability)]. I get that your definition of Dist uses a kind of CPS interface but I’m wondering where if anywhere code something like this would appear:

\m -> mconcat $ Prelude.map (\(x, p) -> scale p (m x)) d

where d is the [(String, Probability)].

It doesn’t seem to make sense that Dist a should be isomorphic to a single value of type a.

olf · December 21, 2024, 9:02pm

We’re using the Riesz-Markov-Kakutani representation theorem here.
My implementation of Dist is representing the distribution by the way one can integrate functions against it. Consider the types in the mathematical expression ∫ f dμ. Integration takes a scalar-valued function f :: a -> r and a distribution μ and produces a scalar value. If we fix the distribution μ then this yields a continuation (a -> r) -> r.

The nice thing is that distributions themselves form a cone (and Dist an Alternative functor), whence we can use them as r in the universally-quantified variable.

-- <|> of the Alternative instance
addDists :: Dist a -> Dist a -> Dist a
addDists dx dy = \b -> dx b <> dy b

-- empty of the Alternative instance
zeroDist :: Dist a
zeroDist = const mempty

-- scale of the Cone instance
scaleDist :: Probability -> Dist a -> Dist a
scaleDist p dx = \b -> scale p (dx b)

This gives you all the tools to define an embedding of finite distributions:

fromFinDist :: (Cone (dist a), Alternative dist) => 
   [(Probability,a)] -> dist a
fromFinDist = foldr f empty where
    f (p,a) dx = (scale p (pure a)) <|> dx

olf · December 21, 2024, 9:06pm

That is because integration against a Dirac distribution is just function evaluation. If μ is the Dirac distribution with all mass (probability) concentrated at x, then ∫ f dμ = f(x).