And previously on Discourse: Why Data.Set (containers) doesn’t have Functor and Monad instance?; (… cont) Data.Set no Functor/Monad instance.
Thanks. I did look in the more cursory documentation, but “(…)” seems to be unGoogleable.
Hehe, typical Oleg: a couple of sentences of description, then screeds of dense code. The critical decl is:
data SetMonad a where SMOrd :: Ord a => S.Set a -> SetMonad a SMAny :: [a] -> SetMonad a -- with bind m >>= f = collect . map f $ toList m
[Editted] So this is a GADT with the map f
producing a List
of SetMonad
s, eachsome hiding an Ord
dictionary – exactly the same Ord
dictionary, in fact. But the compiler can’t figure that out (because the dictionary can’t get unrolled until we enter each cons
node), so walking the tree will suffer the polymorphic dictionary-lookup penalty. Or in the worst case, collect
in scanning the List
can’t find a dictionary, so uses notoriously poor-performing append to possibly allow duplicates in its result.
The nodes not hiding an Ord
dictionary might contain duplicates; they’re lists in not necessarily ascending sequence; so S.union
ing them with the nicely-organised nodes will amortise that performance hit across the accesses.
The SetMonad
GADT is not nested. It is just a sum of Set
and []
.
GHC will point it out to you : ) But it’s 2 dots/periods as per
The idea that makes it efficient is that although it is possible for fmap
to create duplicates, by mapping multiple inputs to the same output, (>>=)
will not create duplicates as long as the function argument returns a set with a known Ord
, which is not terribly difficult to arrange. The MonadPlus
instance Oleg added is unfortunate, though, since it makes it tempting to construct sets by using it with a bunch of return
s.
Dat is goed
om te weten.
Thank you, you’re quite correct. I’ve editted my post. (Note to self: don’t try grokking Oleg-iness just before bedtime.) I still see Bad Things nested. What threw me was:
collect ((SMOrd x):t) = ... collect ((SMAny x):t) = ...
collect
is (trying to) clean up the mess made by the map f
; that’s a List
of SMOrd/SMany
s with some in the t
ail repeating the same Ord
dictionary and/or repeating duplicate elements.
Hmm? These days, Monad
is supposed to have Applicative .. Functor
superclasses. (I think that wasn’t the case in 2013.) What would a Functor
instance look like? And how would it be lawful?
fmap f xs = xs >>= return . f
I get that an Eq
is necessary in general, but in this situation I think I can do without it (also, in suitable circumstances, with anything like what Python calls an OrderedDict
). Here’s the code I ended up with, using zipWith
. I’m not 100% sure that my the monad laws are obeyed, but it seems to be working.
import Control.Applicative
import Control.Monad (liftM, ap)
newtype Probability = P {unP :: Float}
deriving (Fractional, Num, Show, Eq, Ord)
newtype Dist a = D {unD :: [(a,Probability)]}
deriving (Eq, Show)
instance Functor Dist where
fmap = liftM
instance Applicative Dist where
pure x = D [(x,P 1.0)]
(<*>) = ap
instance Monad Dist where
return = pure
d >>= f = dist_join $ dist_fmap f d
onD :: ([(a,Probability)] -> [(a,Probability)]) -> Dist a -> Dist a
onD f = D . f . unD
onDP :: ([(a,Probability)] -> [(b,Probability)]) -> Dist a -> Dist b
onDP f = D . f . unD
a = D [("a",1.0)]
normalize dl = map adjust dl
where
adjust (first, second) = map (fmap (*second)) $ unD first
zipAndSum dist1 dist2 = zipWith (\(a, prob1) (_, prob2) -> (a, prob1 + prob2)) dist1 dist2
collect xs = foldl (zipAndSum) (head xs) (tail xs)
dist_join :: Dist (Dist a) -> Dist a
dist_join dd = onDP (collect . normalize) dd
dist_fmap :: (a -> Dist b) -> Dist a -> Dist (Dist b)
dist_fmap f (D xs) = D [(f a, p) | (a, p) <- xs]
type Trans a = a -> Dist a
trans :: Trans String
trans "a" = D [("a", P 0.75),("b", P 0.05),("c", P 0.2)]
trans "b" = D [("a", P 0.25),("b", P 0.70),("c", P 0.05)]
trans "c" = D [("a", P 0.25),("b", P 0.70),("c", P 0.05)]
dist_bind t m = dist_join $ dist_fmap t m
fixpoint :: Eq a => (a -> a) -> a -> a
fixpoint f x
| x' == x = x'
| otherwise = fixpoint f x'
where x' = f x
I’m not sure I understand what you’re asking. With the caveat that I already mention about how fmap
can cause duplicates, there is no problem implementing it. It’s lawful as long as you limit the ways that these values can be observed. Basically you have to use a function that converts it to a Set
in order to observe it.
I used several possible implementations of a finite distribution monad and found no one to be superior to Erwig’s naive definition. The reason is that usage of structures like Map
or Seq
will waste time on allocations and garbage collection while normalizing, while the list-based distribution just happily (and lazily) concatenates, which does not appear to be a problem if at the end you just fold the distribution.
Of course computing fixed points is another thing. But you may be out of luck here since functions involving Float
rarely have fixed points, even if they numerically converge. Same holds true for Maps, although Joachim Breitner has made some amazing progress in this direction.
It seems a maximally free implementation can be leveraged to compute fixed points.
class Monoid r => Cone r where
scale :: Probability -> r -> r
-- Monoid to be understood additively
type Dist a = forall r. Cone r => (a -> r) -> r
pure :: a -> Dist a
pure a = \b -> b a
bind :: (a -> Dist b) -> Dist a -> Dist b
bind k m = \b -> m (flip k b)
Now the fixed point of bind k
at m
is given by
\m -> \b -> m (fix (flip k))
The above is an instance of a more general scheme. Consider this functor:
data DistF x = Mempty | MPlus x x | Score Probability x deriving (Functor)
It embodies the basic operations a cone (as well as distributions, which form a cone) support: the MonadPlus
operations (i.e. choice) and scoring. Now any finite distribution can be embedded into this type:
import Control.Monad.Free.Church (F)
type Dist = F DistF
A value of type F DistF a
is isomorphic to
forall r. (a -> r) -> (DistF r -> r) -> r
which says: Given the mass of every basic element in the event space, and a way to add and score masses, we can compute the mass of a distribution.
Suppose k :: a -> F f a
is the Kleisli map you want to iteratively bind.
Further suppose f :: f r -> r
. Then the term
k' = \b. \a -> k a b f
has type (a -> r) -> (a -> r)
and it can be shown that the fixed point of iteratively binding k
to any m :: F f a
is
\b f -> m (fix k') f
Thus we have reduced the problem of computing fix (k =<<)
to the problem of computing the fixed point of an element-scoring operator k'
.
Thanks for this. It’s not clear to me how you go from
bind k m = \b -> m (flip k b)
to
\m -> \b -> m (fix (flip k))
Maybe if you could just spell that out in more detail, or perhaps point out that I am missing something obvious.
I have to say I am struggling to implement this and I think it’s all too abstract for me at my current level of knowledge.
Although I’ve explored further with this, the minimal problem is reproduced by this code, which is almost verbatim from the parent post:
type Probability = Float
class Monoid r => Cone r where
scale :: Probability -> r -> r
-- Monoid to be understood additively
type Dist a = forall r. Cone r => (a -> r) -> r
pure :: a -> Dist a
pure a = \b -> b a
bind :: (a -> Dist b) -> Dist a -> Dist b
bind k m = \b -> m (flip k b)
This gives the following error:
olf.hs:13:26: error: [GHC-83865]
• Couldn't match type: forall r1. Cone r1 => (b -> r1) -> r1
with: (b -> r) -> r
Expected: a -> (b -> r) -> r
Actual: a -> Dist b
• In the first argument of ‘flip’, namely ‘k’
In the first argument of ‘m’, namely ‘(flip k b)’
In the expression: m (flip k b)
• Relevant bindings include
b :: b -> r (bound at olf.hs:13:13)
k :: a -> Dist b (bound at olf.hs:13:6)
bind :: (a -> Dist b) -> Dist a -> Dist b (bound at olf.hs:13:1)
|
13 | bind k m = \b -> m (flip k b)
| ^
Maybe “maximally free” translates to “not actually typeable”, I don’t know!
It’s unclear to me how the Dist type is supposed to work, frankly, even the pure
function.
As I say, I’ve explored further and keep failing, presumably due my misunderstanding of the model that’s being proposed.
Thanks for any help anyone can offer!
Apologies, you need the RankNTypes
language extension, which I neglected to include in the post. Apart from that, there was a change in type inference between GHC 8 and 9 that stops the code from type checking.
You can paste the code into the Haskell Playground and choose the compiler version. It will compile with GHC 8.
To satisfy GHC 9, we need the more involved:
{-# LANGUAGE RankNTypes, ScopedTypeVariables #-}
type Dist a = forall r. Cone r => (a -> r) -> r
bind :: forall a b. (a -> Dist b) -> Dist a -> Dist b
bind k m = let
flipk :: forall r. Cone r => (b -> r) -> a -> r
flipk b a = k a b
in m . flipk
Okay, observe that bind k m = m . flip k
where
flip k :: forall r. Cone r => (b -> r) -> (a -> r)
Now we specialize b=a
since we want k :: a -> Dist a
. This makes
flip k :: forall r. Cone r => (a -> r) -> (a -> r)
Now it is not hard to see that
bind k (bind k m) = (m . flip k) . flip k
or in terms of function composition
(bind k . bind k) m = m . (flip k . flip k)
By induction, iterating the bind k
operator on any starting distribution m
is the same as pre-composing the m
functional with iterated compositions of flip k
.
We may than ask whether there is a limit to this process. Apparently we need to compute the infinite composition of a function with itself. Suppose f :: x -> x
(where in our case, x = (a -> r)
but that does not matter for now). We need a function f_iterated
with the property that
f_iterated = f . f_iterated
Hence
f_iterated x = f (f_iterated x)
Observe that f_iterated x
obeys the same equation as fix f
, namely
fix f = f (fix f)
whence we conclude that f_iterated x = fix f
and thereby f_iterated = const (fix f)
.
EDIT: There is a problem with the above calculations. The Riesz embedding of distributions into continuations always produces linear continuations. If r
is a cone, then the function space a -> r
has (point-wise) cone structure. Thus we can talk about a functional phi :: (a -> r) -> r
to preserve cone structure, in the sense that
phi mempty = mempty
phi (f <> g) = phi f <> phi g
phi (scale p f) = scale p (phi f)
But even if flip k
is linear by construction, its iterated limit const (fix (flip k))
needs not be linear unless it is a trivial function like constant bottom or constant mempty
.
What we are missing here is a proof that the function bind k
does possess a sensible fixed point at all. Recall that fix
always computes the least fixed point, whereas e.g. bind pure
(by monad laws) holds every distribution fixed.
OK, Thanks for this.
So, if I want to define an instance of Dist
– Dist String
, say – I am perplexed that pure
is so simple. I would expect any instance of Dist
to involve a representation of a distribution, say [(String, Probability)]
. I get that your definition of Dist
uses a kind of CPS interface but I’m wondering where if anywhere code something like this would appear:
\m -> mconcat $ Prelude.map (\(x, p) -> scale p (m x)) d
where d
is the [(String, Probability)]
.
It doesn’t seem to make sense that Dist a
should be isomorphic to a single value of type a
.
We’re using the Riesz-Markov-Kakutani representation theorem here.
My implementation of Dist
is representing the distribution by the way one can integrate functions against it. Consider the types in the mathematical expression ∫ f dμ
. Integration takes a scalar-valued function f :: a -> r
and a distribution μ
and produces a scalar value. If we fix the distribution μ
then this yields a continuation (a -> r) -> r
.
The nice thing is that distributions themselves form a cone (and Dist
an Alternative
functor), whence we can use them as r
in the universally-quantified variable.
-- <|> of the Alternative instance
addDists :: Dist a -> Dist a -> Dist a
addDists dx dy = \b -> dx b <> dy b
-- empty of the Alternative instance
zeroDist :: Dist a
zeroDist = const mempty
-- scale of the Cone instance
scaleDist :: Probability -> Dist a -> Dist a
scaleDist p dx = \b -> scale p (dx b)
This gives you all the tools to define an embedding of finite distributions:
fromFinDist :: (Cone (dist a), Alternative dist) =>
[(Probability,a)] -> dist a
fromFinDist = foldr f empty where
f (p,a) dx = (scale p (pure a)) <|> dx
That is because integration against a Dirac distribution is just function evaluation. If μ
is the Dirac distribution with all mass (probability) concentrated at x
, then ∫ f dμ = f(x)
.