[RFC] New MonoidMap type

Dear Haskellers

I’d like to seek some community feedback on a new library I’ve been developing, tentatively called monoidmap.

The library provides a strict MonoidMap type that:

I’d love to hear people’s feedback on:

  • the API design and module structure;
  • names for things, including modules, types, and functions;
  • other things which are difficult to change once a package is published.

For a comprehensive guide to this library, please see:

How is MonoidMap useful?

Suppose you find yourself implementing a map-like data structure where:

  • the value type (or its container) is an instance of Monoid;
  • lookup operations assume or require a canonical default value of mempty;
  • the internal data structure should not include mappings to mempty values;
  • encodings of your data type (e.g., to JSON, CBOR, YAML) should not include mempty values.

For example, let’s suppose you want to build a MultiMap type that maps keys to sets of values, with the following instances and operations:

newtype MultiMap k v = MultiMap (Map k (Set v))
    -- invariant: there are no mappings to empty sets.
    deriving newtype (Eq, Show)

fromList :: [(k, Set v)] -> MultiMap k v
toList   :: MultiMap k v -> [(k, Set v)]

lookup :: k -> MultiMap k v -> Set v

insert :: k -> Set v -> MultiMap k v -> MultiMap k v
remove :: k -> Set v -> MultiMap k v -> MultiMap k v

union        :: MultiMap k v -> MultiMap k v -> MultiMap k v
intersection :: MultiMap k v -> MultiMap k v -> MultiMap k v

Additionally, suppose that the above operations must satisfy the following properties:

-- We can perform round-trip encoding and decoding:
fromList . toList == id

-- Decoding must exclude empty sets:
fromList == fromList . filter ((/= Set.empty) . snd)

-- Encoding must exclude empty sets:
toList == filter ((/= Set.empty) . snd) . toList

-- Insertion and removal are equivalent to union and difference on value sets:
lookup k (insert k vs m) == lookup k m `Set.union`      vs
lookup k (remove k vs m) == lookup k m `Set.difference` vs

-- Union and intersection are distributive over pairs of values for matching keys:
lookup k (union        m1 m2) == Set.union        (lookup k m1) (lookup k m2)
lookup k (intersection m1 m2) == Set.intersection (lookup k m1) (lookup k m2)

To implement this data type, a developer (under time pressure) might write definitions similar to:

lookup :: k -> MultiMap k v -> Set v
lookup k (MultiMap m) =
    Map.findWithDefault Set.empty k m

insert :: k -> Set v -> MultiMap k v -> MultiMap k v
insert k vs (MultiMap m) =
    MultiMap $
    Map.insert k (Map.findWithDefault Set.empty k m `Set.union` vs) m

intersection :: MultiMap k v -> MultiMap k v -> MultiMap k v
intersection (MultiMap m1) (MultiMap m2) =
    MultiMap $
    Map.intersectionWith Set.intersection m1 m2

union :: MultiMap k v -> MultiMap k v -> MultiMap k v
union (MultiMap m1) (MultiMap m2) =
    MultiMap $
    Map.unionWith Set.union m1 m2

However, this implementation is problematic. While the definition for union is correct, the implementations of insert and intersection both violate the invariant that the internal Map must not include any mappings to Set.empty.

Consider the following example of intersection:

>>> setX = Set.fromList [1, 2, 3]
>>> setY = Set.fromList [4, 5, 6]

>>> mapX = fromList [('a', setX)]
>>> mapY = fromList [('a', setY)]

>>> intersection mapX mapY
fromList [('a', fromList [])] 💥💥💥

As a consequence, simple equality checks (that you might expect to hold) are broken:

>>> lhs = intersection (fromList [('a', setX)]) (fromList [('a', setY)])
>>> rhs = fromList [('a', Set.intersection setX setY)]
>>> lhs == rhs
False 💥💥💥

To fix this, we can rewrite the insert and intersection operations, this time taking care to prevent the creation of mappings to Set.empty within the internal Map:

insert k vs (MultiMap m)
    | Set.null xs = MultiMap (Map.delete k    m)
    | otherwise   = MultiMap (Map.insert k xs m)
  where
    xs = Map.findWithDefault Set.empty k m `Set.union` vs

intersection (MultiMap m1) (MultiMap m2) = MultiMap $
    Map.merge
        Map.dropMissing
        Map.dropMissing
        (Map.zipWithMaybeMatched mergeValues)
        m1
        m2
  where
    mergeValues :: k -> Set v -> Set v -> Maybe (Set v)
    mergeValues _k s1 s2
        | Set.null s3 = Nothing
        | otherwise   = Just s3
      where
        s3 = Set.intersection s1 s2

These operations now preserve the invariant, but there’s a drawback: the implementation is more complex than before, and there’s always a risk that a future developer may introduce a regression that causes values of Set.empty to appear within the internal Map.

Suppose that we want to make the code more robust against regressions. It turns out there’s a specialised non-empty variant of the Set type, called NESet. Let’s redefine MultiMap in terms of NESet:

- newtype MultiMap k v = MultiMap (Map k (  Set v))
+ newtype MultiMap k v = MultiMap (Map k (NESet v))

Since there is no way to construct an empty value of NESet, we can be more confident that the MultiMap invariant will hold across all operations.

Let’s take a look at the updated insert and intersection operations:

insert k vs (MultiMap m) =
    case NESet.nonEmptySet xs of
        Nothing -> MultiMap (Map.delete k    m)
        Just ys -> MultiMap (Map.insert k ys m)
  where
    xs = maybe Set.empty NESet.toSet (Map.lookup k m) `Set.union` vs

intersection (MultiMap m1) (MultiMap m2) = MultiMap $
    Map.merge
        Map.dropMissing
        Map.dropMissing
        (Map.zipWithMaybeMatched mergeValues)
        m1
        m2
  where
    mergeValues :: Ord v => k -> NESet v -> NESet v -> Maybe (NESet v)
    mergeValues _k s1 s2 = NESet.nonEmptySet (NESet.intersection s1 s2)

While the code is now protected against regressions that introduce Set.empty into the data structure, the use of NESet has not helped us to reduce the complexity of the code, as we must still provide special handling for the empty case. (For example, the definition of intersection is still significantly more complex than union.)

Finally, let’s rewrite this type in terms of MonoidMap:

- newtype MultiMap k v = MultiMap (      Map k (NESet v))
+ newtype MultiMap k v = MultiMap (MonoidMap k   (Set v))

Notice that we’ve also reverted to using the ordinary Set container type to hold our values.

With this updated type definition, we can provide definitions for MultiMap operations that are both simple and correct:

lookup k (MultiMap m) =
    MonoidMap.get k m

insert k vs (MultiMap m) =
    MultiMap (MonoidMap.adjust (Set.union vs) k m)

intersection (MultiMap m1) (MultiMap m2) =
    MultiMap (MonoidMap.intersectionWith Set.intersection m1 m2)

union (MultiMap m1) (MultiMap m2) =
    MultiMap (MonoidMap.unionWith Set.union m1 m2)

In the above example, the MonoidMap type:

  • provides a canonical default value of Set.empty.
  • takes care of the internal invariant that there are no mappings to Set.empty.
  • removes the need to write specialised code to handle empty cases.
  • removes the need to use a specialised non-empty variant of Set.

What else can MonoidMap do?

The MonoidMap type provides support for monoidal algebra, based on subclasses of Semigroup and Monoid provided by the monoid-subclasses and groups libraries.

To illustrate this, let’s continue with the MultiMap example from above. Is there any way to further simplify our implementation?

Firstly, we can take advantage of the fact that Set is an instance of two subclasses of Monoid called GCDMonoid and LCMMonoid. These classes provide the gcd and lcm operations, where:

  • gcd a b computes the greatest unique value that can be wholly “subtracted” from either a or b.
  • lcm a b computes the smallest unique value from which either a or b can be wholly “subtracted”.

Note that in this context, “subtraction” refers to </> from the Reductive subclass of Semigroup. The </> operation is a partial inverse of the Semigroup <> operation, where:

  • a </> b “subtracts” b from a;
  • a </> b produces a result Just r if (and only if) we can reverse the “subtraction” of b so that r <> b == a.

Coming back to the GCDMonoid.gcd and LCMMonoid.lcm operations: for the Set type, these operations correspond to Set.intersection and Set.union respectively.

The MonoidMap type also provides instances of GCDMonoid and LCMMonoid, where gcd and lcm are defined in terms of pointwise application of gcd and lcm to pairs of values for matching keys, satisfying the following properties:

get k (gcd m1 m2) == gcd (get k m1) (get k m2)
get k (lcm m1 m2) == lcm (get k m1) (get k m2)

Armed with this knowledge, we can now further simplify our definitions of intersection and union for MultiMap:

  newtype MultiMap k v = MultiMap (MonoidMap k (Set v))
+     deriving newtype (GCDMonoid, LCMMonoid)

- intersection (MultiMap m1) (MultiMap m2) =
-     MultiMap (MonoidMap.intersectionWith Set.intersection m1 m2)
+ intersection = GCDMonoid.gcd

- union (MultiMap m1) (MultiMap m2) =
-     MultiMap (MonoidMap.unionWith Set.intersection m1 m2)
+ union = LCMMonoid.lcm

Suppose that later on, we want to extend our MultiMap module with a difference operation that is analogous to Set.difference. For example, we might wish for the following behaviour:

>>> mapX = fromList [('a', Set.fromList [1, 2, 3])]
>>> mapY = fromList [('a', Set.fromList [3, 4, 5])]
>>> mapX `difference` mapY
fromList [('a', fromList [1, 2])]

At this point, we can make use of the fact that Set is an instance of the Monus subclass of Monoid. The Monus class provides the <\> operation, which performs monus subtraction (also known as “truncated” subtraction).

In the case of the Set type, monus subtraction with <\> is defined as equivalent to Set.difference.

Since the MonoidMap type also provides a Monus instance (defined in terms of pointwise application of <\> to pairs of values for matching keys), we can write the following very simple definition for difference:

  newtype MultiMap k v = MultiMap (MonoidMap k (Set v))
-     deriving newtype (GCDMonoid, LCMMonoid)
+     deriving newtype (GCDMonoid, LCMMonoid, Monus)

+ difference :: MultiMap k v -> MultiMap k v -> MultiMap k v
+ difference = (Monus.<\>)

The MonoidMap type also supports several other monoidal operations. See monoidal operations within the API documentation for more details.

In general, monoidal operations (and class instances) are defined in terms of the underlying monoidal value type, making it possible to transform, combine, and compare maps in ways that are consistent with the algebraic properties of the underlying monoidal value type.

How does MonoidMap perform?

In most cases, performance of MonoidMap operations is similar to those of Data.Map.Strict. There’s a small (constant) performance overhead for operations that need to perform canonicalisation of mempty values. However, most operations should have the same basic time complexity as Data.Map.Strict.

See this benchmark run for a comparison between Data.Map.Strict and a RecoveredMap type implemented in terms of MonoidMap.

At this stage, I’ve opted to focus mainly on safety, correctness, and API design; work on benchmarking is still in the early stages.

Hasn’t this been done before?

Yes and no. There are several libraries that provide monoidal maps, and several libraries that attempt to model finitely-supported total functions (with default values), but as far as I’m aware, no-one has published a library that combines both of these things with support for monoidal algebra. (But I’d love to know if there is something similar out there.)

If you’re got this far, many thanks for reading. I’m really keen to hear feedback and questions!

References

12 Likes

Exciting, but I have a hard time understanding boring real world use cases. The examples mostly look still very high in the abstraction layer.

I can see there’s a need for normalization wrt equality, but you could as well do that when actually checking for equality (and pay for the overhead only then). Sure, it won’t obey those laws, but I haven’t seen myself reaching for those properties yet.

1 Like

You could indeed defer normalisation to just the operations that need it, such as the equality check.

However, this would also require certain operations to perform a linear-time traversal instead of being computable in constant time:

  • determining whether a MonoidMap is null (i.e., whether the support set is empty).
  • determining the size of a MonoidMap (i.e., the size of the support set).

This complexity would be compounded in the case of nested structures like:

newtype NestedMonoidMap k1 k2 v =
    NestedMonoidMap (MonoidMap k1 (MonoidMap k2 v))

(See NestedMonoidMap example.)

Another way to think about this is to compare MonoidMap with the standard Map type. We can think of these types in the following way:

  • Map k v:

    • models a total function (with finite support) from keys of type k to values of type Maybe v.
    • has a default value of Nothing.
    • does not encode mappings to Nothing with its internal data structure.
  • MonoidMap k v:

    • models a total function (with finite support) from keys of type k to monoidal values of type v.
    • has a default value of mempty.
    • does not encode mappings to mempty within its internal data structure.

We can also recover the behaviour of an ordinary Map with:

newtype RecoveredMap k v = RecoveredMap
    --  Use 'First' to mimic the left-biased nature of 'Data.Map':
    (MonoidMap k (First v))
    deriving newtype (Eq, Semigroup, Monoid)

(See RecoveredMap example.)

3 Likes

This is great! I think monoidalmaps are usually much more useful than the normal Map from containers. I was annoyed that the monoidal-containers map still didn’t always use the monoidal structure due to backwards compatibility.

Although, I think I personally would prefer a formulation in terms of (semi-)lattices instead of GCD-/LCMMonoid. Then I would also get nice fixed points for free. Or maybe we can have both?

2 Likes

I’ve found the monoidal structure very useful when doing any kind of analysis. The keys then are the objects you’re analysing and the values are collection of all the information your analysis has produced. This collection of information is naturally monoidal (or like a lattice).

You can get 90% of the same utility by using unionWith (<>) from containers but that is both verbose and error prone. A true monoidal map type would allow you to simply write <> instead.

I have an example piece of code that collects information about dependencies. It is kind of a bad example because it is in a DSL preprocessor for Haskell, but I hope it is still readable:

If I had the MonoidalMap type I could instead have written fold instead of the ugly foldr (Map.unionWith (<>)) mempty.

2 Likes

Very nice! For many applications, this is much more natural than Data.Map, I think.

In the readme you describe this conditional functor law:

(∀ v. (v /= mempty) ==> (f v /= mempty))
    ==>
    map (g . f) == map g . map f

Wouldn’t it be more natural to simply require

f mempty = mempty

for the argument to map? It seems that this also suffices to get not only

map (g . f) == map g . map f

but also (and maybe more importantly)

get k (map f m) == f (get k m)

Or put differently: I’d probably prefer if the meaning of map f m is not affected whether the implementation performs minimisation or not.

Or put yet differently: map (const k m), for k ≠ mempty, looks like it creates a map with infinite carrier, and that’s simply not supported.

In that view, I could even imagine map to dynamically enforce this property, and throw an exception if null (f mempty) doesn’t hold.

2 Likes

I like it! The next time I need a monoidal map, this will be the first one to try.

The only concern I have is the lack of a lazy map variant. I may be in minority, but sometimes I find lazy maps unavoidable. Usually when I’m recursively building a map with the knot-tying technique.

1 Like

Thanks for mentioning this! Other people have also mentioned to me that they’d like to see a formulation in terms of lattices. I’d definitely be open to exploring this possibility.

This would indeed be sufficient to satisfy:

get k (map f m) == f (get k m)

But I’m not sure it’s sufficient to satisfy the Functor composition law. (Though perhaps I’ve misunderstood your argument?)

Suppose we take the following definitions of f and g:

f = const mempty
g = const "z"

And suppose we have MonoidMap m:

m :: MonoidMap String String
m = singleton "k" "v"

If we substitute these definitions into the functor composition law:

map (g . f) == map g . map f

Then on the left we obtain:

map (g . f) m = map (const "z" . const mempty) (singleton "k" "v")
              = map (const "z"               ) (singleton "k" "v")
              =                                (singleton "k" "z")

But on the right we obtain:

map g (map f m) = map (const "z") (map (const mempty) (singleton "k" "v"))
                = map (const "z") mempty
                =                 mempty

Which is of course a violation of the composition law.

To satisfy the composition law in general, I think we need:

(∀ v. (v /= mempty) ==> (f v /= mempty))

(Intuition: the current definition of MonoidMap.map only applies f to non-mempty values associated with the support set, and we need map f to preserve the support set.)

But as you point out, if we also want to satisfy:

∀ k. get k (map f m) == f (get k m)

Then we also need:

f mempty == mempty

So if we want to satisfy both, then we need:

∀ v. (f v == mempty) == (v == mempty)

In other words, f would have to be constrained so that it always preserves the mempty vs non-mempty distinction, where:

  • f always maps v1 | v1 == mempty to some v2 | v2 == mempty.
  • f always maps v1 | v1 /= mempty to some v2 | v2 /= mempty.

(This constraint is also sufficient to guarantee that map f preserves the support set.)

Of course, for the ordinary Map type, this constraint is trivially satisified for all f.

If we view ordinary Map k v as modelling a total function with finite support from keys of type k to values of type Maybe v (where the support set is precisely the set of keys associated with Just _), then we can also view Map.map f as equivalent to applying function f' to the entire range of the function being modelled, where f' is:

f' :: (v1 -> v2) -> Maybe v1 -> Maybe v2
f' f mv1 = case mv1 of
    Nothing -> Nothing
    Just x1 -> Just (f x1)

In other words:

  • f' always maps v1 | v1 == Nothing to some v2 | v2 == Nothing.
  • f' always maps v1 | v1 == Just x1 to some v2 | v2 == Just x2.

If we wanted to provide a function for MonoidMap that is analogous to Data.Map.map, that guarantees to preserve the support set, is total for all possible keys, and satisfies the functor composition law, we could imagine a function like:

mapNonNull :: (NonNull v1 -> NonNull v2) -> MonoidMap k v1 -> MonoidMap k v2 

(Where NonNull is some container that somehow guarantees its internal values are never mempty.)

However, as far as I can tell, even if we had such a mapNonNull function, it wouldn’t help us to build a Functor instance for MonoidMap itself, due to the type of fmap.

Thanks! It was a pleasure to be able to reuse your monoid-subclasses library here. (I hope I’ve done it justice.)

If you do, I’d be very interested to hear how you get on, and if it works for you.

Thanks for mentioning this! I do agree that a Lazy variant would be useful. I hope to be able to provide this in a future version.

1 Like

But then g violates g mempty = mempty!

I’d argue that with your map, one should never use map with a function that doesn’t send mempty to mempty, because the fundamental get-map law doesn’t hold then. And as long as one sticks to such functions, the functor law works fine.

I find your proposed law a bit too strong: The mappe’d function should be ok to return mempty for non-mempty elements. And I don’t think it’s needed, if both f and g satisfy my proposed law.

2 Likes

Of course, it’s annoying to have such restrictions on functions passed to map. I think you could lift it if your data structure allows the total functions it models to have a non-mempty value outside their finite support.

Of course still with mempty as the default, and minification can still kick in when the default value happens to be mempty, so a generalization.

Have you considered this? It certainly makes the story around map nicer - so what would break?

Thanks for all your comments @nomeata! This is exactly the kind of feedback I was hoping for when making this RFC.

After giving this some thought, I agree with you that if f mempty == mempty, then this is enough to get us both:

f mempty == mempty ==> ∀ k. get k (map f m) == f (get k m)
f mempty == mempty ==> ∀ g. map (f . g) m == map f (map g m)

I’ve updated the documentation to reflect this (see PR 170).

Perhaps then this is an issue with the naming of the map function? (Perhaps we could call it mapNonNull?)

The documentation for MonoidMap.map only claims to apply f to the non-null values of a MonoidMap i.e., the values associated with the support set. (This is conceptually similar to Map k v, in the sense that Map.map f only applies f to values associated with the support set of the function from k to Maybe v.)

The current implementation of MonoidMap.map does actually make it possible to build new types with lawful Functor instances. For example, consider this definition, which recreates the left-biased behaviour of an ordinary Map:

newtype RecoveredMap k v = RecoveredMap
    (MonoidMap k (First v))
    deriving newtype (Eq, Semigroup, Monoid)

instance Functor (RecoveredMap k) where
    fmap f (RecoveredMap m) =
        RecoveredMap $ MonoidMap.map (fmap f) m

fromList :: Ord k => [(k, v)] -> RecoveredMap k v
fromList = RecoveredMap
    . MonoidMap.fromListWith (const id) 
    . fmap (fmap pure)

This gives us the following behaviour:

>>> m = RecoveredMap.fromList [(1 :: Int, "a"), (2, "b")]
>>> m
fromList [(1,"a"),(2,"b")]

>>> (\v -> v <> v) <$> m
fromList [(1,"aa"),(2,"bb")]

>>> const "" <$> m
fromList [(1,""),(2,"")]

>>> const "hello" . const "" <$> m
fromList [(1,"hello"),(2,"hello")]

>>> const "hello" <$> (const "" <$> m)
fromList [(1,"hello"),(2,"hello")]

(See RecoveredMap example.)

One design principle that I’m trying to adhere to is to completely avoid partial functions in the public API. I’d definitely prefer to avoid throwing exceptions, if possible.

I suspect this would break people’s expectations about equality.

The Eq instance is currently derived from the underlying Map, and currently satisfies the following laws:

(m1 == m2) == (∀ k. MonoidMap.get k m1 == MonoidMap.get k m2)
(m1 == m2) ==> (show m1 == show m2)

If the internal data structure were allowed to hold non-null values, we could of course redefine the Eq and Show instances so that the above laws continue to hold. One way to do this would be to define these instances in terms of functions like:

canonicalEq
    :: (Eq k, Eq v, MonoidNull v)
    => MonoidMap k v
    -> MonoidMap k v
    -> Bool
canonicalEq = (==) `on` canonicalToList

canonicalToList
    :: forall k v. MonoidNull v
    => MonoidMap k v
    -> [(k, v)]
canonicalToList
    = L.filter (not . C.null . snd)
    . Map.toAscList
    . coerce @(MonoidMap k v) @(Map k v)

canonicalShow
    :: (Show k, Show v, MonoidNull v)
    => MonoidMap k v
    -> String
canonicalShow = ("fromList " <>) . show . canonicalToList

But then I think the following law would no longer hold (in general):

(m1 == m2) ==> (MonoidMap.map f m1 == MonoidMap.map f m2)

Consider this example (with a MonoidMap that is modified to allow non-null values internally, and with Eq and Show instances defined in terms of canonicalEq and canonicalShow):

>>> m1 = MonoidMap.fromList [('a', Sum 1), ('b', Sum 2)] :: MonoidMap Char (Sum Natural)
>>> m1
fromList [('a',Sum {getSum = 1}),('b',Sum {getSum = 2})]

>>> m2 = MonoidMap.map (fmap (`mod` 2)) m1
>>> m2
fromList [('a',Sum {getSum = 1})]

>>> m3 = MonoidMap.singleton 'a' (Sum 1)
>>> m3
fromList [('a',Sum {getSum = 1})]

>>> f = (<> Sum 1)

>>> m2 == m3
True

>>> MonoidMap.map f m2 == MonoidMap.map f m3
False 💥💥💥

>>> MonoidMap.map f m2
fromList [('a',Sum {getSum = 2}), ('b',Sum {getSum = 1})]

>>> MonoidMap.map f m3
fromList [('a',Sum {getSum = 2})]

Of course, the operation f = (<> Sum 1) does not satisfy f mempty == mempty.

We could solve this by adding a qualification to the equality law:

(f empty == empty) ==> ((m1 == m2) == (MonoidMap.map f m1 == MonoidMap.map f m2))

But this brings us back full circle to the idea of restricting map to only accept functions for which f mempty == mempty, which I think is what we were trying to avoid in the first place by allowing the inner data structure to hold non-null values.

Since we probably don’t want map to throw run-time exceptions, there don’t seem to be obviously good options here.

Perhaps one option would be to rename map to mapNonNull, making it less likely that users would assume it has the same behaviour as fmap. A counterargument might be that other types with no Functor instance (such as Set) also have a map operation that is natural for the type, but nevertheless unsuitable for defining a Functor instance.

Perhaps another option is to change the type of map so that it explicitly only operates on non-null values. For example:

- map :: MonoidNull v2 => (        v1 -> v2) -> MonoidMap k v1 -> MonoidMap k v2
+ map :: MonoidNull v2 => (NonNull v1 -> v2) -> MonoidMap k v1 -> MonoidMap k v2

This would make it impossible to provide a function f that maps mempty to a non-mempty value. (This assumes the existence of a NonNull container that cannot be constructed with values of mempty.)

Perhaps yet another option would be to remove map entirely from the public API (for the initial release of MonoidMap), and defer the question of how to define a map to a later release?

2 Likes

One viewpoint for MonoidMap can also be that of a “weighted set”, where all keys with a value different from mempty are viewed as being an element of the set, and the value measures the “weight” of this membership. An example for this are discrete probability distributions. The monoid in question is simply the nonnegative reals with addition. You can recover ordinary sets by chosing the monoid Maybe (), where mempty is “no membership” and Just () is “membership”. So I find this quite useful.

Having worked a lot with monad-bayes lately, I’d naturally think about how to stack the probability distribution aspect with other functors. I recently made some notes about how to implement a set monad transformer, and I expect that one could easily implement such a transformer on top of your package instead of good ol’ containers:

1 Like

I think the name map is fine (it’s not fmap after all), and if it’s documented that the function is only applied to non-null values, then that’s all fine.


You may have misunderstood my proposal, though (but I typed it on the phone, so I couldn’t elaborate).

Right now your data structure is (conceptually)

type MonoidMap k v = Data.Map.Map k v

with the invariant that only non-null vaules are stored in the map. Semantically, this can represent all finite-support functions f with restriction that f x = mempty for almost all x.

My suggest is to change this data structure to

type MonoidMap k v = (Data.Map.Map k v, v)

i.e. add a slot for the value of the semantic function f outside it’s finite support. Now, this can represent all finite-support functions.

For example

empty = (Map.empty, mzero)
get x (m, dv) = Map.findWithDefault dv x m 

and in particular

map f (m, dv) = (Map.map f m, f dv)

Of course, minimization now need to be careful when dv isn’t mzero, but in those cases where dv = mzero, it can apply just as it can now.

Of course, this is a bigger conceptual change from what you started with originally, so it’s reasonable to say such finite-support functions are simply not supported (heh), and stick with the map behavior your have right now.

3 Likes

There are two very exciting things here for me:

  1. The lawful instance (Ord k, MonoidNull g, Group g) => Group (MonoidMap k g). (I think we’ve talked about this before.)
  2. No lens dependency. monoidal-containers depends on lens to provide Ixed and At instances. (It also gets FooWithIndex from lens, but could get it from a much lighter dependency on indexed-traversable.) I love lens, but here it makes monoidal-containers hard to justify in libraries that aren’t already lensing.

Does your MonoidMap admit lawful instances for the classes from indexed-traversable? They could be pretty easy to add and it’s not a heavy dependency.

BTW: Your cabal file lists every dependency in a separate common stanza, just to repeat bounds between each target. Cabal solves dependency information at the package level, so this isn’t buying you much. When the Hackage Trustees activate a new user’s upload permissions, the standard advice we send includes:

One important thing to note is that you only need to include version bounds once. For example, if you depend on the same package in your library and your test suite, you only need to put the version bounds for that dependency in one place. This keeps the dependency bounds information DRY.

So you could delete all the common declarations and list dependency bounds at their first use in the .cabal file.

1 Like

Thanks, this feedback is appreciated!

Good point! I’ll try to make this restriction clearer in the documentation. In particular, the existing overview seems to imply that MonoidMap can represent all finitely-supported functions without restriction, which of course it can’t, as you rightly point out.

Having said that, if we view Data.Map k v as representing a total function with finite support from keys of type k to values of type Maybe v, then that type has a similar restriction: that default values are always Nothing.

Thanks for clarifying!

This was actually an option that I explored early on in the design of MonoidMap. (The Data.TotalMap.TMap type uses a similar data structure, and provides a trim operation to remove default values from the inner Map.)

I eventually ran into the following issues:

  1. Suppose we wish to retain auto-minimisation:

    • Functions performing auto-minimisation would now need to check for equality with the default value.
    • This would presumably require the value type to be an instance of Eq rather than MonoidNull.
    • But requiring Eq would restrict the set of value types for which MonoidMap is usable.
      • Justification: if we have Eq and Monoid instances, we can always trivially define a MonoidNull instance such that null == (== mempty), but the reverse is not true.
      • For example: consider the type Maybe (x -> y), which is an instance of MonoidNull (defined in terms of simple pattern matching on Nothing), but not an instance of Eq, as that would require an Eq instance for (x -> y).
    • There might also be other types for which Eq is relatively expensive, but for which MonoidNull is cheap.
  2. Suppose we opt to remove auto-minimisation (and redefine == in terms of a canonicalEq function):

    • Operations such as size and null might no longer be computable in constant time, unless of course we find a way to cache the size. Without a cached size, we’d presumably need to traverse the entire map in order to determine which values are equal to the default.

    • For large, long-lived data structures where it’s important to retain a minimal memory footprint, users would arguably have to do more work to understand how and when default values may be introduced into the internal Map, and how they can be removed. We could provide a trim operation with an Eq constraint on values (similar to Data.TotalMap.TMap), but that comes with its own complications. See this discussion on trim.

  3. We’d have to change the types of functions like {from,to}{List,Map}. Not an insurmountable problem, but it would arguably complicate the API.

    • For the IsList instance, we could define Item to be Either v (k, v), where Left v represents a default value and Right (k, v) represents a key-value mapping. And then in the case of fromList:
      • if the caller supplied a list with multiple default values, we could coalesce them in the order they appear with <>, as we already do for multiple key-value mappings that share the same key;
      • if the caller does not supply a default value, we could assume a default value of mempty.
  4. We’d have to make a decision on whether (or not) to include default (possibly non-empty) values in folds and traversals, and if so, how to include them. For example, what semantics should the fold operation have? Should it only fold over non-default values associated with keys in the support set?

After pondering these questions for a while, I decided to go for the simpler option for the moment: to restrict the default value to mempty, at least for the first release.

Having said that, I do agree with you that it would be really nice to have a data type that can represent all total functions with finite support, without the restriction that the default value is mempty. In particular, if there were a way to build such a type that is strictly more general than the existing MonoidMap type, then perhaps MonoidMap could be redefined in terms of that type. I’d love to explore this option for a future release.

3 Likes

I agree that having Eq “just” for minimisation of a TMap-style data structure would be bad, compared to the nice and efficient MonoidNull interface right now. My thought was to stick to IsNull and only minimise when the default value happens to be mempty. But you make good arguments (around size and toList) that show why this would be an odd design as well. Thanks for entertaining this discussion!

2 Likes

Thank-you too! This kind of discussion was exactly what I had hoped for when making this RFC. Your questions and suggestions have given me a valuable opportunity to re-examine and question my initial assumptions, for which I am extremely grateful. Thanks!

Thanks for suggesting this.

I think we could definitely provide an instance for FoldableWithIndex, but it’s probably not possible to provide lawful instances of FunctorWithIndex or TraversableWithIndex, at least not for MonoidMap directly, as MonoidMap k is not itself a Functor.

However, we can define various newtype wrappers around MonoidMap that do admit lawful Functor instances. For example:

newtype FirstMap k v = FirstMap (MonoidMap k (First v))

instance Functor (FirstMap k) where
    fmap f (FirstMap m) = FirstMap $ MonoidMap.map (fmap f) m

In cases such as these, I suspect we can also define instances of FunctorWithIndex and TraversableWithIndex.