Monad instance trouble: Any way to implement unionWith on some kind of map type, without an Ord constraint?

TheoH · December 9, 2024, 6:27pm

All of the implementations of unionWith seem to require the key type to be ordered. This seems to mean that I can’t use such maps as monads.

The concrete example I’m trying to get working is the distribution monad, as implemented at https://web.engr.oregonstate.edu/~erwig/pfp/:

newtype Probability = P {unP :: Float}
newtype Dist a = D {unD :: [(a,Probability)]}

instance Monad Dist where
  return x    = D [(x,P 1.0)]
  (D d) >>= f = D [(y, q * p) | (x, p) <- d, (y, q) <- unD (f x)]

This (pretty old) code has the feature that values of type Dist get “fragmented” every time bind is used.
A norm function is supplied, that coalesces/accumulates the probability values for each key.

-- normalization = grouping
-- 
normBy :: Ord a => (a -> a -> Bool) -> Dist a -> Dist a
normBy f = onD $ accumBy f . sort

accumBy :: (Ord b, Num b) => (a -> a -> Bool) -> [(a,b)] -> [(a,b)]
accumBy f ((x,p):ys@((y,q):xs)) | f x y     = accumBy f ((x,p+q):xs)
                                | otherwise = (x,p):accumBy f ys
accumBy _ xs = xs

norm ::  Ord a => Dist a -> Dist a
norm = normBy (==)

norm' :: Ord a => [(a,Probability)] -> [(a,Probability)]
norm' = accumBy (==) . sort

I want to find the fixed point resulting from binding a function onto a particular initial distribution. In order to do this, I tried to redefine the bind function in the Dist monad instance:

instance Monad Dist where
  return x    = D [(x,P 1.0)]
  (D d) >>= f = norm $ D [(y, q * p) | (x, p) <- d, (y, q) <- unD (f x)]

My idea was to normalise after every bind, so that fix could be used to find the fixed point. This doesn’t work because the monad is unconstrained, and norm needs an Ord constraint.

• No instance for ‘Ord b’ arising from a use of ‘norm’
      Possible fix:
        add (Ord b) to the context of
          the type signature for:
            (>>=) :: forall a b. Dist a -> (a -> Dist b) -> Dist b
    • In the first argument of ‘($)’, namely ‘norm’
      In the expression:
        norm $ D [(y, q * p) | (x, p) <- d, (y, q) <- unD (f x)]
      In an equation for ‘>>=’:
          (D d) >>= f
            = norm $ D [(y, q * p) | (x, p) <- d, (y, q) <- unD (f x)]

So I’m trying to think of ways around this problem, and I don’t really know what to do. One approach would be to define fmap and join instead of defining >>= directly, and use some kind of unionWith operation to implement join. But, as noted, this doesn’t seem to get around the Ord requirement. Maybe I can use zipWith to merge values of type Dist that I know contain the same “keys” in the same order?

Please let me know if I’m going about this the wrong way, and should be pursuing some other approach, such as using a constrained monad.

jaror · December 9, 2024, 6:35pm

Why not just define your own bind, for example:

import Prelude hiding (Monad (..))
(>>=) :: Ord b => Dist a -> (a -> Dist b) -> Dist b
(D d) >>= f = norm $ D [(y, q * p) | (x, p) <- d, (y, q) <- unD (f x)]

Do you want people other than you to actually use this bind in their programs?

TheoH · December 9, 2024, 7:05pm

Ah, thanks. That seems like a good idea. I’m looking for docs on what the “(…)” in the import line means. Is it just a wildcard? I can’t seem to find a formal definition of that syntax anywhere.

jaror · December 9, 2024, 7:08pm

It’s defined in the Haskell 2010 report here: https://www.haskell.org/onlinereport/haskell2010/haskellch5.html#x11-100013x4

Entities in an export list may be named as follows:
…
4. A class C with operations f1,…,fn declared in a class declaration may be named in one of three ways:

The form C names the class but not the class methods.

The form C(f1,…,fn), names the class and some or all of its methods.

The abbreviated form C(..) names the class and all its methods that are in scope (whether qualified or not).

The third bullet point there applies.

jmcarthur · December 9, 2024, 7:43pm

You might be interested in reading this email from Oleg about defining a Set monad. It doesn’t normalize 100% aggressively, such as when using fmap, but it does alright for certain patterns of use.

treblacy · December 9, 2024, 7:58pm

Coalescing requires at least Eq.

(You don’t normally see it in math because math assumes implicit pervasive ubitiquous universal built-in magical (not required to be computable) support of equality.)

AntC2 · December 9, 2024, 8:10pm

And previously on Discourse: Why Data.Set (containers) doesn’t have Functor and Monad instance?; (… cont) Data.Set no Functor/Monad instance.

TheoH · December 9, 2024, 11:20pm

Thanks. I did look in the more cursory documentation, but “(…)” seems to be unGoogleable.

AntC2 · December 10, 2024, 8:42am

Hehe, typical Oleg: a couple of sentences of description, then screeds of dense code. The critical decl is:

data SetMonad a where
    SMOrd :: Ord a => S.Set a -> SetMonad a
    SMAny :: [a] -> SetMonad a

-- with bind
    m >>= f = collect . map f $ toList m

[Editted] So this is a GADT with the map f producing a List of SetMonads, ~~each~~some hiding an Ord dictionary – exactly the same Ord dictionary, in fact. But the compiler can’t figure that out (because the dictionary can’t get unrolled until we enter each cons node), so walking the tree will suffer the polymorphic dictionary-lookup penalty. Or in the worst case, collect in scanning the List can’t find a dictionary, so uses notoriously poor-performing append to possibly allow duplicates in its result.

The nodes not hiding an Ord dictionary might contain duplicates; they’re lists in not necessarily ascending sequence; so S.unioning them with the nicely-organised nodes will amortise that performance hit across the accesses.

jaror · December 10, 2024, 9:26am

The SetMonad GADT is not nested. It is just a sum of Set and [].

arpl · December 10, 2024, 12:21pm

GHC will point it out to you : ) But it’s 2 dots/periods as per

jmcarthur · December 10, 2024, 12:54pm

The idea that makes it efficient is that although it is possible for fmap to create duplicates, by mapping multiple inputs to the same output, (>>=) will not create duplicates as long as the function argument returns a set with a known Ord, which is not terribly difficult to arrange. The MonadPlus instance Oleg added is unfortunate, though, since it makes it tempting to construct sets by using it with a bunch of returns.

jaror · December 10, 2024, 1:13pm

That’s not @TheoH’s fault. DIscourse renders .. as … unless you put it in a code block.

arpl · December 10, 2024, 2:39pm

Dat is goed om te weten.

AntC2 · December 10, 2024, 10:24pm

Thank you, you’re quite correct. I’ve editted my post. (Note to self: don’t try grokking Oleg-iness just before bedtime.) I still see Bad Things nested. What threw me was:

collect ((SMOrd x):t) = ...
collect ((SMAny x):t) = ...

collect is (trying to) clean up the mess made by the map f; that’s a List of SMOrd/SManys with some in the tail repeating the same Ord dictionary and/or repeating duplicate elements.

AntC2 · December 11, 2024, 12:30am

Hmm? These days, Monad is supposed to have Applicative .. Functor superclasses. (I think that wasn’t the case in 2013.) What would a Functor instance look like? And how would it be lawful?

fmap f xs  =  xs >>= return . f

TheoH · December 11, 2024, 12:53am

I get that an Eq is necessary in general, but in this situation I think I can do without it (also, in suitable circumstances, with anything like what Python calls an OrderedDict). Here’s the code I ended up with, using zipWith. I’m not 100% sure that my the monad laws are obeyed, but it seems to be working.

import Control.Applicative
import Control.Monad (liftM, ap)

newtype Probability = P {unP :: Float}
  deriving (Fractional, Num, Show, Eq, Ord)

newtype Dist a = D {unD :: [(a,Probability)]}
 deriving (Eq, Show)

instance Functor Dist where
  fmap = liftM

instance Applicative Dist where
  pure x = D [(x,P 1.0)]
  (<*>) = ap

instance Monad Dist where
  return      = pure
  d >>= f = dist_join $ dist_fmap f d

onD :: ([(a,Probability)] -> [(a,Probability)]) -> Dist a -> Dist a
onD f  = D . f . unD

onDP :: ([(a,Probability)] -> [(b,Probability)]) -> Dist a -> Dist b
onDP f  = D . f . unD

a = D [("a",1.0)]

normalize dl = map adjust dl
  where
  adjust (first, second) = map (fmap (*second)) $ unD first

zipAndSum dist1 dist2 = zipWith (\(a, prob1) (_, prob2) -> (a, prob1 + prob2)) dist1 dist2
collect xs = foldl (zipAndSum) (head xs) (tail xs)

dist_join :: Dist (Dist a) -> Dist a
dist_join dd = onDP (collect . normalize) dd

dist_fmap :: (a -> Dist b) -> Dist a -> Dist (Dist b)
dist_fmap f (D xs) = D [(f a, p) | (a, p) <- xs]

type Trans a = a -> Dist a

trans :: Trans String
trans "a" = D [("a", P 0.75),("b", P 0.05),("c", P 0.2)]
trans "b" = D [("a", P 0.25),("b", P 0.70),("c", P 0.05)]
trans "c" = D [("a", P 0.25),("b", P 0.70),("c", P 0.05)]

dist_bind t m = dist_join $ dist_fmap t m

fixpoint :: Eq a => (a -> a) -> a -> a
fixpoint f x
  | x' == x = x'
  | otherwise = fixpoint f x'
  where x' = f x

jmcarthur · December 11, 2024, 3:58am

I’m not sure I understand what you’re asking. With the caveat that I already mention about how fmap can cause duplicates, there is no problem implementing it. It’s lawful as long as you limit the ways that these values can be observed. Basically you have to use a function that converts it to a Set in order to observe it.

olf · December 12, 2024, 11:06pm

I used several possible implementations of a finite distribution monad and found no one to be superior to Erwig’s naive definition. The reason is that usage of structures like Map or Seq will waste time on allocations and garbage collection while normalizing, while the list-based distribution just happily (and lazily) concatenates, which does not appear to be a problem if at the end you just fold the distribution.
Of course computing fixed points is another thing. But you may be out of luck here since functions involving Float rarely have fixed points, even if they numerically converge. Same holds true for Maps, although Joachim Breitner has made some amazing progress in this direction.

It seems a maximally free implementation can be leveraged to compute fixed points.

class Monoid r => Cone r where
   scale :: Probability -> r -> r
   -- Monoid to be understood additively

type Dist a = forall r. Cone r => (a -> r) -> r

pure :: a -> Dist a
pure a = \b -> b a

bind :: (a -> Dist b) -> Dist a -> Dist b
bind k m = \b -> m (flip k b)

Now the fixed point of bind k at m is given by

\m -> \b -> m (fix (flip k))

The above is an instance of a more general scheme. Consider this functor:

data DistF x = Mempty | MPlus x x | Score Probability x deriving (Functor)

It embodies the basic operations a cone (as well as distributions, which form a cone) support: the MonadPlus operations (i.e. choice) and scoring. Now any finite distribution can be embedded into this type:

import Control.Monad.Free.Church (F)
type Dist = F DistF

A value of type F DistF a is isomorphic to

forall r. (a -> r) -> (DistF r -> r) -> r

which says: Given the mass of every basic element in the event space, and a way to add and score masses, we can compute the mass of a distribution.

Suppose k :: a -> F f a is the Kleisli map you want to iteratively bind.
Further suppose f :: f r -> r. Then the term

k' = \b. \a -> k a b f

has type (a -> r) -> (a -> r) and it can be shown that the fixed point of iteratively binding k to any m :: F f a is

\b f -> m (fix k') f

Thus we have reduced the problem of computing fix (k =<<) to the problem of computing the fixed point of an element-scoring operator k'.

TheoH · December 16, 2024, 8:25pm

Thanks for this. It’s not clear to me how you go from

bind k m = \b -> m (flip k b)

to

\m -> \b -> m (fix (flip k))

Maybe if you could just spell that out in more detail, or perhaps point out that I am missing something obvious.