Getting fields from the "union" of two records

I mistakenly believed that it was impossible to do in Haskell (because having a “disjunction” of constraints is tricky) but after reading section 3.1 of Hiromi Ishii’s functional pearl “Witness Me — Constructive Arguments Must Be Guided with Concrete Witness” turns out it’s doable!

Here’s a gist with an example. The gist defines a helper datatype data RU r1 r2 = RU r1 r2 for the “union” of two records, along with a suitable HasField instance.

First, we must define a type-level binary search tree:

data Map symbol q
  = E
  | N (Map symbol q) symbol q (Map symbol q)
  deriving (Show, Eq)

Along with some helper type families. Here are their kind signatures:

type Insert :: Symbol -> q -> Map Symbol q -> Map Symbol q
-- entries in the first map win in collisions
type Combine :: Map Symbol q -> Map Symbol q -> Map Symbol q
type Find :: Symbol -> Map Symbol q -> q
 -- makes all entries in a map equal to some "value" of kind q
type Overwrite :: q -> Map symbol z -> Map symbol q
-- gets the field map of a conventional record, using Generics
type RecordCode :: Type -> Map Symbol Type

Having done that, the next step is to define a version of HasField which takes an extra parameter of kind Location, which tells us where to find the field:

data Location = 
    InFirst
  | InSecond

type UnionHasField :: Location -> Symbol -> Type -> Type -> Constraint
class UnionHasField location x r a | location x r -> a where
  getFieldU :: r -> a

Here are the instances for RU:

instance HasField k r1 a => UnionHasField InFirst k (RU r1 r2) a where
  getFieldU (RU r1 _) = getField @k r1

instance HasField k r2 a => UnionHasField InSecond k (RU r1 r2) a where
  getFieldU (RU _ r2) = getField @k r2

The final step is to define a HasField instance for RU which delegates to UnionHasField. And, while delegating, we must also calculate the Location of each field of the union:

instance
  UnionHasField
    (Find k (Overwrite InFirst (RecordCode r1) `Combine` (Overwrite InSecond (RecordCode r2))))
    k
    (RU r1 r2)
    a =>
  HasField k (RU r1 r2) a
  where
  getField r = getFieldU @(Find k (Overwrite InFirst (RecordCode r1) `Combine` (Overwrite InSecond (RecordCode r2)))) @k r

We obtain the map of fields of r1, and we make all the “values” equal to InFirst. The same for r2 but with InSecond. Then we combine the maps and find the record in which we should find the field k.

(It’s a bit annoying that we must write that complicated type family expression two times, one in the constraints and one in the type application. It might even be slower to compile. How to avoid that?)

Now we can do something like:

data Person = Person {name :: String, age :: Int} 
                       deriving (Show, G.Generic)
data Address = Address {street :: String, number :: Int} 
                        deriving (Show, G.Generic)

personAddress = RU (Person "John" 50) (Address "Rue del Percebe" 2)

main :: IO ()
main = do
  print $ getField @"name" personAddress
  print $ getField @"age" personAddress
  print $ getField @"street" personAddress
  print $ getField @"number" personAddress
3 Likes

You can bind it to a type variable:

instance
  ( loc ~ Find k (Overwrite InFirst  (RecordCode r1)
        `Combine` Overwrite InSecond (RecordCode r2)) 
  , UnionHasField loc k (RU r1 r2) a
  ) => HasField k (RU r1 r2) a
  where
  getField r = getFieldU @loc @k r

Here is my full code with a naive unbalanced map implementation (I totally missed your gist link…):

{-# LANGUAGE StandaloneKindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE AllowAmbiguousTypes #-}

import GHC.TypeLits
import GHC.Generics
import Data.Kind
import GHC.Records

data Map symbol q
  = E
  | N (Map symbol q) symbol q (Map symbol q)
  deriving (Show, Eq)

type Insert :: Symbol -> q -> Map Symbol q -> Map Symbol q
type family Insert a b c where
  Insert k v E = N E k v E
  Insert k v (N l' k' v' r') = Insert' (CmpSymbol k k') k v (N l' k' v' r')

type Insert' :: Ordering -> Symbol -> q -> Map Symbol q -> Map Symbol q
type family Insert' a b c d where
  Insert' GT k v (N l' k' v' r') = N l' k' v' (Insert k v r')
  Insert' _  k v (N l' k' v' r') = N (Insert k v l') k' v' r'

-- entries in the first map win in collisions
type Combine :: Map Symbol q -> Map Symbol q -> Map Symbol q
type family Combine a b where
  Combine E y = y
  Combine x E = x
  Combine (N l1 k1 v1 r1) (N l2 k2 v2 r2) = Combine' (CmpSymbol k1 k2) (N l1 k1 v1 r1) (N l2 k2 v2 r2)

type Combine' :: Ordering -> Map Symbol q -> Map Symbol q -> Map Symbol q
type family Combine' a b c where
  Combine' LT (N l1 k1 v1 r1) (N l2 k2 v2 r2) = Combine r1 (N (Combine (N l1 k1 v1 E) l2) k2 v2 r2)
  Combine' EQ (N l1 k1 v1 r1) (N l2 k2 v2 r2) = N (Combine l1 l2) k1 v1 (Combine r1 r2)
  Combine' GT (N l1 k1 v1 r1) (N l2 k2 v2 r2) = Combine l1 (N l2 k2 v2 (Combine (N E k1 v1 r1) r2))

type Find :: Symbol -> Map Symbol q -> q
type family Find a b where
  Find k (N l' k' v' r') = Find' (CmpSymbol k k') k (N l' k' v' r')

type Find' :: Ordering -> Symbol -> Map Symbol q -> q
type family Find' a b c where
  Find' LT k (N l _ _ _) = Find k l
  Find' EQ _ (N _ _ v _) = v
  Find' GT k (N _ _ _ r) = Find k r

 -- makes all entries in a map equal to some "value" of kind q
type Overwrite :: q -> Map symbol z -> Map symbol q
type family Overwrite a b where
  Overwrite _ E = E 
  Overwrite v (N l k _ r) = N (Overwrite v l) k v (Overwrite v r)

-- gets the field map of a conventional record, using Generics
type RecordCode :: Type -> Map Symbol Type
type family RecordCode a where
  RecordCode t = RecordCode' (Rep t)

type RecordCode' :: (k -> Type) -> Map Symbol Type
type family RecordCode' a where
  RecordCode' (D1 _ (C1 _ x)) = RecordCode'' x

type RecordCode'' :: (k -> Type) -> Map Symbol Type
type family RecordCode'' a where
  RecordCode'' (S1 ('MetaSel ('Just fieldName) _ _ _) (Rec0 fieldType)) = Insert fieldName fieldType E
  RecordCode'' (S1 ('MetaSel ('Just fieldName) _ _ _) (Rec0 fieldType) :*: xs) = Insert fieldName fieldType (RecordCode'' xs)

data Location = 
    InFirst
  | InSecond

type UnionHasField :: Location -> Symbol -> Type -> Type -> Constraint
class UnionHasField location x r a | location x r -> a where
  getFieldU :: r -> a

data RU r1 r2 = RU r1 r2

instance HasField k r1 a => UnionHasField InFirst k (RU r1 r2) a where
  getFieldU (RU r1 _) = getField @k r1

instance HasField k r2 a => UnionHasField InSecond k (RU r1 r2) a where
  getFieldU (RU _ r2) = getField @k r2

instance
  ( loc ~ Find k (Overwrite InFirst  (RecordCode r1)
        `Combine` Overwrite InSecond (RecordCode r2)) 
  , UnionHasField loc k (RU r1 r2) a
  ) => HasField k (RU r1 r2) a
  where
  getField r = getFieldU @loc @k r

data Person = Person {name :: String, age :: Int} 
                       deriving (Show, Generic)
data Address = Address {street :: String, number :: Int} 
                        deriving (Show, Generic)

personAddress = RU (Person "John" 50) (Address "Rue del Percebe" 2)

main :: IO ()
main = do
  print $ getField @"name" personAddress
  print $ getField @"age" personAddress
  print $ getField @"street" personAddress
  print $ getField @"number" personAddress
1 Like

You can bind it to a type variable

Ah, you’re right! I forgot one could do that.

I was thinking of a related limitation which does seem to exist. Suppose we want to define an extensible records library based on the type-level tree:

type Record :: Map Symbol Type -> Type
data Record t where
  Empty :: Record E
  Node :: Record left -> v -> Record right -> Record (N left k v right)

type Insertable :: Symbol -> Type -> Map Symbol Type -> Constraint
class Insertable k v t where
    type Insert k v t :: Map Symbol Type
    insert :: v -> Record t -> Record (Insert k v t)

Now Insert is an associated type family of an Insertable typeclass. The class is now necessary because we want to manipulate Record values.

We do the “delegate into an auxiliary typeclass with extra parameters” trick:

type Insertable' :: Ordering -> Symbol -> Type -> Map Symbol Type -> Constraint
class Insertable' ordering k v t where
    type Insert' ordering k v t :: Map Symbol Type
    insert' :: v -> Record t -> Record (Insert' ordering k v t)

instance Insertable' (CmpSymbol k k') k v (N left k' v' right) => Insertable k v (N left k' v' right) where
    type Insert k v (N left k' v' right) =  Insert' (CmpSymbol k k') k v (N left k' v' right)
    insert = undefined

Alas, now it seems that (CmpSymbol k k') must be repeated in both the constraints and the body of Insert. If we try the equality constraint trick, we get:

Main.hs:61:1: error:
    * Type variable `foo' is mentioned in the RHS,
        but not bound on the LHS of the family instance

I’m not sure if this limitation merely one of verbosity, or if it actually slows down compilation with repeated evaluations of CmpSymbol k k'.

I think it also slows down compilation, because I don’t think there is type level common subexpression elimination. If you’re only worried about verbosity then you can also write type synonyms:

type FindU k r1 r2 = Find k (Overwrite InFirst  (RecordCode r1)
                   `Combine` Overwrite InSecond (RecordCode r2))

But this doesn’t really help for CmpSymbol.

1 Like

Incidentally, the manually defined HasField instance for RU seems to work fine with OverloadedRecordDot in GHC 9.2.1-alpha1:

{-# LANGUAGE OverloadedRecordDot #-}
main :: IO ()
main = do
  print $ personAddress.name
  print $ personAddress.age
  print $ personAddress.street
  print $ personAddress.number 
2 Likes