Dear Haskellers
I’d like to seek some community feedback on a new library I’ve been developing, tentatively called monoidmap
.
The library provides a strict MonoidMap
type that:
- models a total function with finite support from keys to monoidal values, with automatic minimal encoding.
- provides support for monoidal algebra, based on classes defined by the
monoid-subclasses
andgroups
libraries.
I’d love to hear people’s feedback on:
- the API design and module structure;
- names for things, including modules, types, and functions;
- other things which are difficult to change once a package is published.
For a comprehensive guide to this library, please see:
How is MonoidMap
useful?
Suppose you find yourself implementing a map-like data structure where:
- the value type (or its container) is an instance of
Monoid
; - lookup operations assume or require a canonical default value of
mempty
; - the internal data structure should not include mappings to
mempty
values; - encodings of your data type (e.g., to JSON, CBOR, YAML) should not include
mempty
values.
For example, let’s suppose you want to build a MultiMap
type that maps keys to sets of values, with the following instances and operations:
newtype MultiMap k v = MultiMap (Map k (Set v))
-- invariant: there are no mappings to empty sets.
deriving newtype (Eq, Show)
fromList :: [(k, Set v)] -> MultiMap k v
toList :: MultiMap k v -> [(k, Set v)]
lookup :: k -> MultiMap k v -> Set v
insert :: k -> Set v -> MultiMap k v -> MultiMap k v
remove :: k -> Set v -> MultiMap k v -> MultiMap k v
union :: MultiMap k v -> MultiMap k v -> MultiMap k v
intersection :: MultiMap k v -> MultiMap k v -> MultiMap k v
Additionally, suppose that the above operations must satisfy the following properties:
-- We can perform round-trip encoding and decoding:
fromList . toList == id
-- Decoding must exclude empty sets:
fromList == fromList . filter ((/= Set.empty) . snd)
-- Encoding must exclude empty sets:
toList == filter ((/= Set.empty) . snd) . toList
-- Insertion and removal are equivalent to union and difference on value sets:
lookup k (insert k vs m) == lookup k m `Set.union` vs
lookup k (remove k vs m) == lookup k m `Set.difference` vs
-- Union and intersection are distributive over pairs of values for matching keys:
lookup k (union m1 m2) == Set.union (lookup k m1) (lookup k m2)
lookup k (intersection m1 m2) == Set.intersection (lookup k m1) (lookup k m2)
To implement this data type, a developer (under time pressure) might write definitions similar to:
lookup :: k -> MultiMap k v -> Set v
lookup k (MultiMap m) =
Map.findWithDefault Set.empty k m
insert :: k -> Set v -> MultiMap k v -> MultiMap k v
insert k vs (MultiMap m) =
MultiMap $
Map.insert k (Map.findWithDefault Set.empty k m `Set.union` vs) m
intersection :: MultiMap k v -> MultiMap k v -> MultiMap k v
intersection (MultiMap m1) (MultiMap m2) =
MultiMap $
Map.intersectionWith Set.intersection m1 m2
union :: MultiMap k v -> MultiMap k v -> MultiMap k v
union (MultiMap m1) (MultiMap m2) =
MultiMap $
Map.unionWith Set.union m1 m2
However, this implementation is problematic. While the definition for union
is correct, the implementations of insert
and intersection
both violate the invariant that the internal Map
must not include any mappings to Set.empty
.
Consider the following example of intersection
:
>>> setX = Set.fromList [1, 2, 3]
>>> setY = Set.fromList [4, 5, 6]
>>> mapX = fromList [('a', setX)]
>>> mapY = fromList [('a', setY)]
>>> intersection mapX mapY
fromList [('a', fromList [])] 💥💥💥
As a consequence, simple equality checks (that you might expect to hold) are broken:
>>> lhs = intersection (fromList [('a', setX)]) (fromList [('a', setY)])
>>> rhs = fromList [('a', Set.intersection setX setY)]
>>> lhs == rhs
False 💥💥💥
To fix this, we can rewrite the insert
and intersection
operations, this time taking care to prevent the creation of mappings to Set.empty
within the internal Map
:
insert k vs (MultiMap m)
| Set.null xs = MultiMap (Map.delete k m)
| otherwise = MultiMap (Map.insert k xs m)
where
xs = Map.findWithDefault Set.empty k m `Set.union` vs
intersection (MultiMap m1) (MultiMap m2) = MultiMap $
Map.merge
Map.dropMissing
Map.dropMissing
(Map.zipWithMaybeMatched mergeValues)
m1
m2
where
mergeValues :: k -> Set v -> Set v -> Maybe (Set v)
mergeValues _k s1 s2
| Set.null s3 = Nothing
| otherwise = Just s3
where
s3 = Set.intersection s1 s2
These operations now preserve the invariant, but there’s a drawback: the implementation is more complex than before, and there’s always a risk that a future developer may introduce a regression that causes values of Set.empty
to appear within the internal Map
.
Suppose that we want to make the code more robust against regressions. It turns out there’s a specialised non-empty variant of the Set
type, called NESet
. Let’s redefine MultiMap
in terms of NESet
:
- newtype MultiMap k v = MultiMap (Map k ( Set v))
+ newtype MultiMap k v = MultiMap (Map k (NESet v))
Since there is no way to construct an empty value of NESet
, we can be more confident that the MultiMap
invariant will hold across all operations.
Let’s take a look at the updated insert
and intersection
operations:
insert k vs (MultiMap m) =
case NESet.nonEmptySet xs of
Nothing -> MultiMap (Map.delete k m)
Just ys -> MultiMap (Map.insert k ys m)
where
xs = maybe Set.empty NESet.toSet (Map.lookup k m) `Set.union` vs
intersection (MultiMap m1) (MultiMap m2) = MultiMap $
Map.merge
Map.dropMissing
Map.dropMissing
(Map.zipWithMaybeMatched mergeValues)
m1
m2
where
mergeValues :: Ord v => k -> NESet v -> NESet v -> Maybe (NESet v)
mergeValues _k s1 s2 = NESet.nonEmptySet (NESet.intersection s1 s2)
While the code is now protected against regressions that introduce Set.empty
into the data structure, the use of NESet
has not helped us to reduce the complexity of the code, as we must still provide special handling for the empty case. (For example, the definition of intersection
is still significantly more complex than union
.)
Finally, let’s rewrite this type in terms of MonoidMap
:
- newtype MultiMap k v = MultiMap ( Map k (NESet v))
+ newtype MultiMap k v = MultiMap (MonoidMap k (Set v))
Notice that we’ve also reverted to using the ordinary Set
container type to hold our values.
With this updated type definition, we can provide definitions for MultiMap
operations that are both simple and correct:
lookup k (MultiMap m) =
MonoidMap.get k m
insert k vs (MultiMap m) =
MultiMap (MonoidMap.adjust (Set.union vs) k m)
intersection (MultiMap m1) (MultiMap m2) =
MultiMap (MonoidMap.intersectionWith Set.intersection m1 m2)
union (MultiMap m1) (MultiMap m2) =
MultiMap (MonoidMap.unionWith Set.union m1 m2)
In the above example, the MonoidMap
type:
- provides a canonical default value of
Set.empty
. - takes care of the internal invariant that there are no mappings to
Set.empty
. - removes the need to write specialised code to handle empty cases.
- removes the need to use a specialised non-empty variant of
Set
.
What else can MonoidMap
do?
The MonoidMap
type provides support for monoidal algebra, based on subclasses of Semigroup
and Monoid
provided by the monoid-subclasses
and groups
libraries.
To illustrate this, let’s continue with the MultiMap
example from above. Is there any way to further simplify our implementation?
Firstly, we can take advantage of the fact that Set
is an instance of two subclasses of Monoid
called GCDMonoid
and LCMMonoid
. These classes provide the gcd
and lcm
operations, where:
-
gcd a b
computes the greatest unique value that can be wholly “subtracted” from eithera
orb
. -
lcm a b
computes the smallest unique value from which eithera
orb
can be wholly “subtracted”.
Note that in this context, “subtraction” refers to </>
from the Reductive
subclass of Semigroup
. The </>
operation is a partial inverse of the Semigroup
<>
operation, where:
-
a </> b
“subtracts”b
froma
; -
a </> b
produces a resultJust r
if (and only if) we can reverse the “subtraction” ofb
so thatr <> b == a
.
Coming back to the GCDMonoid.gcd
and LCMMonoid.lcm
operations: for the Set
type, these operations correspond to Set.intersection
and Set.union
respectively.
The MonoidMap
type also provides instances of GCDMonoid
and LCMMonoid
, where gcd
and lcm
are defined in terms of pointwise application of gcd
and lcm
to pairs of values for matching keys, satisfying the following properties:
get k (gcd m1 m2) == gcd (get k m1) (get k m2)
get k (lcm m1 m2) == lcm (get k m1) (get k m2)
Armed with this knowledge, we can now further simplify our definitions of intersection
and union
for MultiMap
:
newtype MultiMap k v = MultiMap (MonoidMap k (Set v))
+ deriving newtype (GCDMonoid, LCMMonoid)
- intersection (MultiMap m1) (MultiMap m2) =
- MultiMap (MonoidMap.intersectionWith Set.intersection m1 m2)
+ intersection = GCDMonoid.gcd
- union (MultiMap m1) (MultiMap m2) =
- MultiMap (MonoidMap.unionWith Set.intersection m1 m2)
+ union = LCMMonoid.lcm
Suppose that later on, we want to extend our MultiMap
module with a difference
operation that is analogous to Set.difference
. For example, we might wish for the following behaviour:
>>> mapX = fromList [('a', Set.fromList [1, 2, 3])]
>>> mapY = fromList [('a', Set.fromList [3, 4, 5])]
>>> mapX `difference` mapY
fromList [('a', fromList [1, 2])]
At this point, we can make use of the fact that Set
is an instance of the Monus
subclass of Monoid
. The Monus
class provides the <\>
operation, which performs monus subtraction (also known as “truncated” subtraction).
In the case of the Set
type, monus subtraction with <\>
is defined as equivalent to Set.difference
.
Since the MonoidMap
type also provides a Monus
instance (defined in terms of pointwise application of <\>
to pairs of values for matching keys), we can write the following very simple definition for difference
:
newtype MultiMap k v = MultiMap (MonoidMap k (Set v))
- deriving newtype (GCDMonoid, LCMMonoid)
+ deriving newtype (GCDMonoid, LCMMonoid, Monus)
+ difference :: MultiMap k v -> MultiMap k v -> MultiMap k v
+ difference = (Monus.<\>)
The MonoidMap
type also supports several other monoidal operations. See monoidal operations within the API documentation for more details.
In general, monoidal operations (and class instances) are defined in terms of the underlying monoidal value type, making it possible to transform, combine, and compare maps in ways that are consistent with the algebraic properties of the underlying monoidal value type.
How does MonoidMap
perform?
In most cases, performance of MonoidMap
operations is similar to those of Data.Map.Strict
. There’s a small (constant) performance overhead for operations that need to perform canonicalisation of mempty
values. However, most operations should have the same basic time complexity as Data.Map.Strict
.
See this benchmark run for a comparison between Data.Map.Strict
and a RecoveredMap
type implemented in terms of MonoidMap
.
At this stage, I’ve opted to focus mainly on safety, correctness, and API design; work on benchmarking is still in the early stages.
Hasn’t this been done before?
Yes and no. There are several libraries that provide monoidal maps, and several libraries that attempt to model finitely-supported total functions (with default values), but as far as I’m aware, no-one has published a library that combines both of these things with support for monoidal algebra. (But I’d love to know if there is something similar out there.)
If you’re got this far, many thanks for reading. I’m really keen to hear feedback and questions!
References
- README
- API documentation
- Source code of
MonoidMap
-
Usage examples (including several versions of the
MultiMap
example above)