Dear Haskellers
I’d like to seek some community feedback on a new library I’ve been developing, tentatively called monoidmap.
The library provides a strict MonoidMap type that:
- models a total function with finite support from keys to monoidal values, with automatic minimal encoding.
- provides support for monoidal algebra, based on classes defined by the monoid-subclassesandgroupslibraries.
I’d love to hear people’s feedback on:
- the API design and module structure;
- names for things, including modules, types, and functions;
- other things which are difficult to change once a package is published.
For a comprehensive guide to this library, please see:
How is MonoidMap useful?
Suppose you find yourself implementing a map-like data structure where:
- the value type (or its container) is an instance of Monoid;
- lookup operations assume or require a canonical default value of mempty;
- the internal data structure should not include mappings to memptyvalues;
- encodings of your data type (e.g., to JSON, CBOR, YAML) should not include memptyvalues.
For example, let’s suppose you want to build a MultiMap type that maps keys to sets of values, with the following instances and operations:
newtype MultiMap k v = MultiMap (Map k (Set v))
    -- invariant: there are no mappings to empty sets.
    deriving newtype (Eq, Show)
fromList :: [(k, Set v)] -> MultiMap k v
toList   :: MultiMap k v -> [(k, Set v)]
lookup :: k -> MultiMap k v -> Set v
insert :: k -> Set v -> MultiMap k v -> MultiMap k v
remove :: k -> Set v -> MultiMap k v -> MultiMap k v
union        :: MultiMap k v -> MultiMap k v -> MultiMap k v
intersection :: MultiMap k v -> MultiMap k v -> MultiMap k v
Additionally, suppose that the above operations must satisfy the following properties:
-- We can perform round-trip encoding and decoding:
fromList . toList == id
-- Decoding must exclude empty sets:
fromList == fromList . filter ((/= Set.empty) . snd)
-- Encoding must exclude empty sets:
toList == filter ((/= Set.empty) . snd) . toList
-- Insertion and removal are equivalent to union and difference on value sets:
lookup k (insert k vs m) == lookup k m `Set.union`      vs
lookup k (remove k vs m) == lookup k m `Set.difference` vs
-- Union and intersection are distributive over pairs of values for matching keys:
lookup k (union        m1 m2) == Set.union        (lookup k m1) (lookup k m2)
lookup k (intersection m1 m2) == Set.intersection (lookup k m1) (lookup k m2)
To implement this data type, a developer (under time pressure) might write definitions similar to:
lookup :: k -> MultiMap k v -> Set v
lookup k (MultiMap m) =
    Map.findWithDefault Set.empty k m
insert :: k -> Set v -> MultiMap k v -> MultiMap k v
insert k vs (MultiMap m) =
    MultiMap $
    Map.insert k (Map.findWithDefault Set.empty k m `Set.union` vs) m
intersection :: MultiMap k v -> MultiMap k v -> MultiMap k v
intersection (MultiMap m1) (MultiMap m2) =
    MultiMap $
    Map.intersectionWith Set.intersection m1 m2
union :: MultiMap k v -> MultiMap k v -> MultiMap k v
union (MultiMap m1) (MultiMap m2) =
    MultiMap $
    Map.unionWith Set.union m1 m2
However, this implementation is problematic. While the definition for union is correct, the implementations of insert and intersection both violate the invariant that the internal Map must not include any mappings to Set.empty.
Consider the following example of intersection:
>>> setX = Set.fromList [1, 2, 3]
>>> setY = Set.fromList [4, 5, 6]
>>> mapX = fromList [('a', setX)]
>>> mapY = fromList [('a', setY)]
>>> intersection mapX mapY
fromList [('a', fromList [])] 💥💥💥
As a consequence, simple equality checks (that you might expect to hold) are broken:
>>> lhs = intersection (fromList [('a', setX)]) (fromList [('a', setY)])
>>> rhs = fromList [('a', Set.intersection setX setY)]
>>> lhs == rhs
False 💥💥💥
To fix this, we can rewrite the insert and intersection operations, this time taking care to prevent the creation of mappings to Set.empty within the internal Map:
insert k vs (MultiMap m)
    | Set.null xs = MultiMap (Map.delete k    m)
    | otherwise   = MultiMap (Map.insert k xs m)
  where
    xs = Map.findWithDefault Set.empty k m `Set.union` vs
intersection (MultiMap m1) (MultiMap m2) = MultiMap $
    Map.merge
        Map.dropMissing
        Map.dropMissing
        (Map.zipWithMaybeMatched mergeValues)
        m1
        m2
  where
    mergeValues :: k -> Set v -> Set v -> Maybe (Set v)
    mergeValues _k s1 s2
        | Set.null s3 = Nothing
        | otherwise   = Just s3
      where
        s3 = Set.intersection s1 s2
These operations now preserve the invariant, but there’s a drawback: the implementation is more complex than before, and there’s always a risk that a future developer may introduce a regression that causes values of Set.empty to appear within the internal Map.
Suppose that we want to make the code more robust against regressions. It turns out there’s a specialised non-empty variant of the Set type, called NESet. Let’s redefine MultiMap in terms of NESet:
- newtype MultiMap k v = MultiMap (Map k (  Set v))
+ newtype MultiMap k v = MultiMap (Map k (NESet v))
Since there is no way to construct an empty value of NESet, we can be more confident that the MultiMap invariant will hold across all operations.
Let’s take a look at the updated insert and intersection operations:
insert k vs (MultiMap m) =
    case NESet.nonEmptySet xs of
        Nothing -> MultiMap (Map.delete k    m)
        Just ys -> MultiMap (Map.insert k ys m)
  where
    xs = maybe Set.empty NESet.toSet (Map.lookup k m) `Set.union` vs
intersection (MultiMap m1) (MultiMap m2) = MultiMap $
    Map.merge
        Map.dropMissing
        Map.dropMissing
        (Map.zipWithMaybeMatched mergeValues)
        m1
        m2
  where
    mergeValues :: Ord v => k -> NESet v -> NESet v -> Maybe (NESet v)
    mergeValues _k s1 s2 = NESet.nonEmptySet (NESet.intersection s1 s2)
While the code is now protected against regressions that introduce Set.empty into the data structure, the use of NESet has not helped us to reduce the complexity of the code, as we must still provide special handling for the empty case. (For example, the definition of intersection is still significantly more complex than union.)
Finally, let’s rewrite this type in terms of MonoidMap:
- newtype MultiMap k v = MultiMap (      Map k (NESet v))
+ newtype MultiMap k v = MultiMap (MonoidMap k   (Set v))
Notice that we’ve also reverted to using the ordinary Set container type to hold our values.
With this updated type definition, we can provide definitions for MultiMap operations that are both simple and correct:
lookup k (MultiMap m) =
    MonoidMap.get k m
insert k vs (MultiMap m) =
    MultiMap (MonoidMap.adjust (Set.union vs) k m)
intersection (MultiMap m1) (MultiMap m2) =
    MultiMap (MonoidMap.intersectionWith Set.intersection m1 m2)
union (MultiMap m1) (MultiMap m2) =
    MultiMap (MonoidMap.unionWith Set.union m1 m2)
In the above example, the MonoidMap type:
- provides a canonical default value of Set.empty.
- takes care of the internal invariant that there are no mappings to Set.empty.
- removes the need to write specialised code to handle empty cases.
- removes the need to use a specialised non-empty variant of Set.
What else can MonoidMap do?
The MonoidMap type provides support for monoidal algebra, based on subclasses of Semigroup and Monoid provided by the monoid-subclasses and groups libraries.
To illustrate this, let’s continue with the MultiMap example from above. Is there any way to further simplify our implementation?
Firstly, we can take advantage of the fact that Set is an instance of two subclasses of Monoid called GCDMonoid and LCMMonoid. These classes provide the gcd and lcm operations, where:
- 
gcd a bcomputes the greatest unique value that can be wholly “subtracted” from eitheraorb.
- 
lcm a bcomputes the smallest unique value from which eitheraorbcan be wholly “subtracted”.
Note that in this context, “subtraction” refers to </> from the Reductive subclass of Semigroup. The </> operation is a partial inverse of the Semigroup <> operation, where:
- 
a </> b“subtracts”bfroma;
- 
a </> bproduces a resultJust rif (and only if) we can reverse the “subtraction” ofbso thatr <> b == a.
Coming back to the GCDMonoid.gcd and LCMMonoid.lcm operations: for the Set type, these operations correspond to Set.intersection and Set.union respectively.
The MonoidMap type also provides instances of GCDMonoid and LCMMonoid, where gcd and lcm are defined in terms of pointwise application of gcd and lcm to pairs of values for matching keys, satisfying the following properties:
get k (gcd m1 m2) == gcd (get k m1) (get k m2)
get k (lcm m1 m2) == lcm (get k m1) (get k m2)
Armed with this knowledge, we can now further simplify our definitions of intersection and union for MultiMap:
  newtype MultiMap k v = MultiMap (MonoidMap k (Set v))
+     deriving newtype (GCDMonoid, LCMMonoid)
- intersection (MultiMap m1) (MultiMap m2) =
-     MultiMap (MonoidMap.intersectionWith Set.intersection m1 m2)
+ intersection = GCDMonoid.gcd
- union (MultiMap m1) (MultiMap m2) =
-     MultiMap (MonoidMap.unionWith Set.intersection m1 m2)
+ union = LCMMonoid.lcm
Suppose that later on, we want to extend our MultiMap module with a difference operation that is analogous to Set.difference. For example, we might wish for the following behaviour:
>>> mapX = fromList [('a', Set.fromList [1, 2, 3])]
>>> mapY = fromList [('a', Set.fromList [3, 4, 5])]
>>> mapX `difference` mapY
fromList [('a', fromList [1, 2])]
At this point, we can make use of the fact that Set is an instance of the Monus subclass of Monoid. The Monus class provides the <\> operation, which performs monus subtraction (also known as “truncated” subtraction).
In the case of the Set type, monus subtraction with <\> is defined as equivalent to Set.difference.
Since the MonoidMap type also provides a Monus instance (defined in terms of pointwise application of <\> to pairs of values for matching keys), we can write the following very simple definition for difference:
  newtype MultiMap k v = MultiMap (MonoidMap k (Set v))
-     deriving newtype (GCDMonoid, LCMMonoid)
+     deriving newtype (GCDMonoid, LCMMonoid, Monus)
+ difference :: MultiMap k v -> MultiMap k v -> MultiMap k v
+ difference = (Monus.<\>)
The MonoidMap type also supports several other monoidal operations. See monoidal operations within the API documentation for more details.
In general, monoidal operations (and class instances) are defined in terms of the underlying monoidal value type, making it possible to transform, combine, and compare maps in ways that are consistent with the algebraic properties of the underlying monoidal value type.
How does MonoidMap perform?
In most cases, performance of MonoidMap operations is similar to those of Data.Map.Strict. There’s a small (constant) performance overhead for operations that need to perform canonicalisation of mempty values. However, most operations should have the same basic time complexity as Data.Map.Strict.
See this benchmark run for a comparison between Data.Map.Strict and a RecoveredMap type implemented in terms of MonoidMap.
At this stage, I’ve opted to focus mainly on safety, correctness, and API design; work on benchmarking is still in the early stages.
Hasn’t this been done before?
Yes and no. There are several libraries that provide monoidal maps, and several libraries that attempt to model finitely-supported total functions (with default values), but as far as I’m aware, no-one has published a library that combines both of these things with support for monoidal algebra. (But I’d love to know if there is something similar out there.)
If you’re got this far, many thanks for reading. I’m really keen to hear feedback and questions!
References
- README
- API documentation
- Source code of MonoidMap
- 
Usage examples (including several versions of the MultiMapexample above)