Grouping by data constructor

Say I have

import qualified Data.List as L

data MyData = Foo Int | Bar Char | Baz String
    deriving (Eq, Show, Ord)

list :: [MyData]
list = [Foo 1, Bar 'c', Baz "palla", Bar 'd', Foo 8]

I would like to group list by constructors, like this

-- [[Foo 1, Foo 8],  [Bar 'c', Bar 'd'], [Baz "palla"]]
-- to feed to something like printGroup :: [MyData] -> IO ()

The naive implementation is:

groupByConstructor :: [MyData] -> [[MyData]]
groupByConstructor ds = L.groupBy isSameConst (L.sort ds)
    where
          isSameConst :: MyData -> MyData -> Bool
          isSameConst (Foo _) (Foo _) = True
          isSameConst (Bar _) (Bar _) = True
          isSameConst (Baz _) (Baz _) = True
          isSameConst _ _ = False

main :: IO ()
main = print $ groupByConstructor list
    -- [[Foo 1,Foo 8],[Bar 'c',Bar 'd'],[Baz "palla"]]

which works. But has a big problem in my case: if I add constructors to MyData -Wall won’t warning about incomplete pattern matches in isSameConst, which I would really like.

Is there a better way to solve this problem?

1 Like

As a first step, you can rewrite groupByConstructor like this:

import Data.Function (on)

groupByConstructor :: [MyData] -> [[MyData]]
groupByConstructor ds = L.groupBy ((==) `on` constTag) (L.sort ds)
    where
          constTag :: MyData -> Int
          constTag (Foo _) = 0
          constTag (Bar _) = 1
          constTag (Baz _) = 2

where GHC will now warn you when you add new constructors and forget to add a case here. The only remaining caveat is that you could accidentally use the same number twice on the right-hand side.


If you don’t want any boilerplate, you can use Generics, e.g. this is how a minimal approach would look like:

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE StandaloneDeriving #-}

import Data.Function (on)
import Data.List qualified as L
import GHC.Generics (Generic)
import GHC.Generics qualified as G

groupByConstructor :: (Ord a, ConstructorTag a) => [a] -> [[a]]
groupByConstructor = L.groupBy ((==) `on` constructorTag) . L.sort

type ConstructorTag a = (Generic a, GConstructorTag (G.Rep a))

constructorTag :: (ConstructorTag a) => a -> Int
constructorTag = gconstructorTag . G.from

class GConstructorTag a where
  gconstructorTag :: a x -> Int

deriving newtype instance (GConstructorTag f) => GConstructorTag (G.D1 c f)

instance GConstructorTag (G.C1 c f) where
  gconstructorTag _ = 0

instance (GConstructorTag cs) => GConstructorTag (G.C1 c f G.:+: cs) where
  gconstructorTag (G.L1 _) = 0
  gconstructorTag (G.R1 cs) = 1 + gconstructorTag cs

Then, if you add Generic to the deriving clause of MyData, you can use this generic groupByConstructor as a drop-in replacement for your manual groupByConstructor, which will always stay up-to-date when you add new constructors to MyData.

4 Likes