Then, suppose I have a bunch of them, atoms :: [Atom].
Now, I want to group them. Perhaps first by X, then Y, or by X, Z, Y, or …
Naively, I need to write a bunch of terrible functions like byXYZ :: Dict X (Dict Y (Dict Z Atom))), etc, per way I want to group; but there must be a better way?
There’s a “better” way, but it involves some type-level machinery:
{-# LANGUAGE GADTs, DataKinds, TypeFamilies, UndecidableInstances #-}
import Data.Map (Map)
import Data.Map qualified as Map
import Data.Kind
-- e.g. 'NestedDict [X,Y,Z] Atom' is equivalent to
-- 'Map X (Map Y (Map Z [Atom]))'
data NestedDict keys a where
DNil :: [a] -> NestedDict '[] a
DCons :: Map key (NestedDict keys a) -> NestedDict (key : keys) a
instance AllC Ord keys => Semigroup (NestedDict keys a) where
DNil xs <> DNil ys = DNil (xs ++ ys)
DCons xs <> DCons ys = DCons (Map.unionWith (<>) xs ys)
-- Map a type level function over a type level list to get
-- a heterogeneous list
data MapF f xs where
MNil :: MapF f '[]
MCons :: f x -> MapF f xs -> MapF f (x : xs)
-- Require a constraint of all elements of a type level list
type AllC :: (k -> Constraint) -> [k] -> Constraint
type family AllC c xs where
AllC _ '[] = ()
AllC c (x : xs) = (c x, AllC c xs)
-- Given a list of functions that project a key out of the input type 'a',
-- this produces a nested dict of a grouped on those keys.
toNestedDict ::
AllC Ord keys =>
MapF ((->) a) keys ->
[a] ->
NestedDict keys a
toNestedDict MNil xs = DNil xs
toNestedDict (MCons f ms) xs =
DCons (Map.fromListWith (<>) [ (f x, toNestedDict ms [x]) | x <- xs ])
I’m not sure what the context for this is, so this suggestion might be way off, but maybe you want to use an SQL database or something that approximates it in-memory like ixset
If you want a “simple Haskell” solution, you could use generic functions that allow you to specify the grouping criteria.
#!/usr/bin/env cabal
{- cabal:
build-depends:
base ^>=4.18
, containers ^>= 0.6
-}
{- project:
with-compiler: ghc-9.6.3
-}
module Main (main) where
import Data.Map.Lazy (Map)
import qualified Data.Map.Lazy as Map
import Data.Word (Word16, Word32, Word8)
------------------------------------------------------------------------------
groupToMapBy
:: Ord a
=> (x -> a)
-> [x]
-> Map a [x]
groupToMapBy f = Map.fromListWith (++) . map (liftA2 (,) f pure)
groupToMapBy2
:: (Ord a, Ord b)
=> (x -> a)
-> (x -> b)
-> [x]
-> Map a (Map b [x])
groupToMapBy2 f g = fmap (groupToMapBy g) . groupToMapBy f
groupToMapBy3
:: (Ord a, Ord b, Ord c)
=> (x -> a)
-> (x -> b)
-> (x -> c)
-> [x]
-> Map a (Map b (Map c [x]))
groupToMapBy3 f g h = fmap (groupToMapBy2 g h) . groupToMapBy f
------------------------------------------------------------------------------
data Atom
= Atom
{ x :: Word8
, y :: Word16
, z :: Word32
, m :: Int
}
deriving Show
demoAtoms :: [Atom]
demoAtoms =
[ Atom 1 1 1 1, Atom 1 1 2 2, Atom 1 2 1 3, Atom 1 2 2 4, Atom 2 1 1 5
, Atom 2 2 1 6, Atom 3 4 5 7, Atom 3 4 5 8
]
main :: IO ()
main = print $ groupToMapBy3 x y z demoAtoms
Untested, so maybe not exactly this but this general idea:
Write a groupBy :: (a -> key) -> [Atom] -> Dict key [Atom].
Assuming Dict is a Functor: fmap (fmap (fmap single . groupBy x) . groupBy y) . groupBy z where single [x] = x; single xs = ??. If ?? is a Left, then use traverses instead of fmaps to get it up to top level.
I designed deep-map for this! It’s powered both by single-level manipulation functions as well as indexed-traversable for accessing all the way down in one fell swoop.
No one has mentioned monoidal-containers yet, which is just a newtype-wrapped Map or HashMap but frees you from burdens such as mentioned by @maxigit. So for reasonably efficient grouping you could use a MonoidalHashMap k (Seq Atom) where k could be a type of tuples (X,Z,Y) of the fields you are interested in. As @TeofilC suggested, you might want to think about how databases do it: Sometimes there are composite primary keys consisting of several fields, sometimes there are completely separate index structures. Depends on your use-case. The latter could be emulated in Haskell by several Maps without nesting, where the values in each Map point into the same [Atom] list.
Unfortunately fromList and similar functions in monoidal-containers are footguns (they have exactly the same biased behavior as their counterparts from the containers lib):
Yes, it is a footgun, but the type signature of fromList clearly says there is in general no monoidal container being built behind each key, so you can not expect semigroup operation being used. For what its worth, once I wrote a module with the following class to solve the problem of grouping.
class Alternative f => Grouping map f key where
groupWith :: Foldable t => (a -> key) -> (a -> value) -> t a -> map key (f value)
with instances where map is instantiated by the types of monoidal-containers and f can be [] or Seq or Vector. Since the type signature would be ambiguous at the call site, one must explicitly specify which map and f to use. The class instances make sure to use foldMap instead of fromList.
Indeed the discussion of the PR above contains the argument that fromList in its current form should not even be in containers since fromListWith const is almost certainly not what you want.
Summing up, monoidal-containers has served me well as an abstraction in database applications where one typically either folds over the returned rows or aggregates the rows in a monoidal container to process them in another way.