Combining generics with higher-kinded types

Hello,

I am working on a proof-of-concept of a type-safe implementation of dataframes. In this proof-of-concept, I want to essentially take a record type, and allow it to either:

  • contain scalars, e.g. Int and String;
  • contain columns of scalars.

My starting point is type family, Column:

-- | Type family which allows for higher-kinded record types
-- in two forms:
--
-- * Single record type using `Identity`;
-- * Record type whose elements are arrays, using `Vector`.
--
-- Types are created like regular record types, but each element
-- must have the type @`Column` f a@ instead of @a@.
type family Column (f :: Type -> Type) a where
    Column Identity x = x

    Column Vector x = Vector x

-- Example record type that could be turned into a dataframe
data User f
    = MkUser { userName :: Column f String
             , userAge  :: Column f Int
             }
    deriving Generic

In this framework, User Identity is a regular record type, while User Vector is a dataframe – every record of the type is a column of values.

Based on all of this work, let’s consider a simple class FromRows with the following method:

class FromRows t where
    fromRows :: Vector (t Identity) -> t Vector

which is equivalent to turning rows of structure into a structure of columns.
To make dataframes ergonomic, it would be great to create a default implementation of fromRows using generics

class FromRows t where
    fromRows :: Vector (t Identity) -> t Vector

    default fromRows :: ( ??? )
                     => Vector (t Identity) 
                     -> t Vector
    fromRows = ???

-- Ultimate goal -- users just derive the instance
instance FromRows User

I followed the tutorial on Generics by Mark Karpov, but I don’t understand how to do this in practice. I have

class GFromRows t where
    gfromRows :: Vector (t Identity) -> t Vector

instance                               GFromRows U1 where (...)
instance (GFromRows a, GFromRows b) => GFromRows (a :*: b) where (...)
instance GFromRows t =>                GFromRows (M1 i c t) where (...)

class FromRows t where
    fromRows :: Vector (Row t) -> Frame t
    
    default fromRows :: ( ??? )
                        => Vector (Row t) 
                        -> Frame t
    fromRows = (to?) . gfromRows . (from?)

In particular, I’m confused on the constraints on gfromRows. Looks like there is extra ceremony required because t in FromRows t is a higher-kinded type

Edit: I previously used the type synonyms Row and Frame, but did not define them:

-- | Type synonym for a record type with scalar elements
type Row (dt :: (Type -> Type) -> Type) 
    = (dt Identity)

-- | Type synonym for a record type whose elements are arrays (columns)
type Frame (dt :: (Type -> Type) -> Type)   
    = (dt Vector)
1 Like

You’ve given two different type signatures of fromRows, the second of which mentions the undefined Row and Frame types, so I assume the first is the correct one.

I believe for a higher-kinded type you want the Generic1 class. The implementation would look something like this:

class GFromRows t where
  gfromRows :: Vector (t Identity) -> t Vector

instance GFromRows U1 where
  gfromRows _ = U1

instance (GFromRows a, GFromRows b) => GFromRows (a :*: b) where
  gfromRows = fromTuple . bimap gfromRows gfromRows . V.unzip . fmap toTuple
    where
      toTuple (a :*: b) = (a, b)
      fromTuple (a, b) = a :*: b

instance (GFromRows t) => GFromRows (M1 i c t) where
  gfromRows = M1 . gfromRows . fmap unM1

class FromRows t where
  fromRows :: Vector (t Identity) -> t Vector
  default fromRows ::
    (Generic1 t, GFromRows (Rep1 t)) =>
    Vector (t Identity) ->
    t Vector
  fromRows = to1 . gfromRows . fmap from1

However, the compiler won’t derive an instance of Generic1 for your User type (even if you replace Column f by f in the field types:

        Constructor ‘MkUser’ applies a type to an argument involving the last parameter
                              but the applied type is not of kind * -> *

I’m not sure if there is a formulation of User that the compiler could derive a Generic1 instance for.
I suggest you look into the barbies package.

You can do this with Generic (not Generic1), but you have to forget that you’re working with a parameterized type. The Generic class expects a simple type. You have two types t Identity and t Vector. So you treat them as separate instances of Generic, which will both be indices of the GFromRows class.

class GFromRows rI rV where
  gfromRows :: Vector (rI x) -> rV x

The x parameter is unused and unusable, just ignore it. (It’s there for working with Generic1, but you’re not using it here (it’s unusable for your situation too.))

The types rI and rV, since they come from t Identity and t Vector, are supposed to have pretty much the same structure.

They are both products at the same time.

instance (GFromRows tI1 tV1, GFromRows tI2 tV2) => GFromRows (tI1 :*: tI2) (tV1 :*: tV2) where
   ...

And in the field case, you know that when a field of t Identity has type a, the same field in t Vector will have type Vector a.

instance (v ~ Vector a) => GFromRows (K1 i a) (K1 i v) where
  gfromRows = K1 . fmap unK1

The default implementation will look like this:

default fromRows :: (Generic (t Identity), Generic (t Vector), GFromRow (Rep (t Identity)) (Rep (t Vector))) => Vector (t Identity) -> t Vector
fromRows = to . gfromRows . fmap from
3 Likes

Ahh I apologize, I didn’t define Row and Frame. Post edited

Amazing, this works! Thank you. I’ll try to write a blog post soon about this

2 Likes

Thanks to @Lysxia I am putting together a little prototype here

1 Like