Hello,
I am working on a proof-of-concept of a type-safe implementation of dataframes. In this proof-of-concept, I want to essentially take a record type, and allow it to either:
- contain scalars, e.g.
Int
andString
; - contain columns of scalars.
My starting point is type family, Column
:
-- | Type family which allows for higher-kinded record types
-- in two forms:
--
-- * Single record type using `Identity`;
-- * Record type whose elements are arrays, using `Vector`.
--
-- Types are created like regular record types, but each element
-- must have the type @`Column` f a@ instead of @a@.
type family Column (f :: Type -> Type) a where
Column Identity x = x
Column Vector x = Vector x
-- Example record type that could be turned into a dataframe
data User f
= MkUser { userName :: Column f String
, userAge :: Column f Int
}
deriving Generic
In this framework, User Identity
is a regular record type, while User Vector
is a dataframe – every record of the type is a column of values.
Based on all of this work, let’s consider a simple class FromRows
with the following method:
class FromRows t where
fromRows :: Vector (t Identity) -> t Vector
which is equivalent to turning rows of structure into a structure of columns.
To make dataframes ergonomic, it would be great to create a default implementation of fromRows
using generics
class FromRows t where
fromRows :: Vector (t Identity) -> t Vector
default fromRows :: ( ??? )
=> Vector (t Identity)
-> t Vector
fromRows = ???
-- Ultimate goal -- users just derive the instance
instance FromRows User
I followed the tutorial on Generics by Mark Karpov, but I don’t understand how to do this in practice. I have
class GFromRows t where
gfromRows :: Vector (t Identity) -> t Vector
instance GFromRows U1 where (...)
instance (GFromRows a, GFromRows b) => GFromRows (a :*: b) where (...)
instance GFromRows t => GFromRows (M1 i c t) where (...)
class FromRows t where
fromRows :: Vector (Row t) -> Frame t
default fromRows :: ( ??? )
=> Vector (Row t)
-> Frame t
fromRows = (to?) . gfromRows . (from?)
In particular, I’m confused on the constraints on gfromRows
. Looks like there is extra ceremony required because t
in FromRows t
is a higher-kinded type
Edit: I previously used the type synonyms Row
and Frame
, but did not define them:
-- | Type synonym for a record type with scalar elements
type Row (dt :: (Type -> Type) -> Type)
= (dt Identity)
-- | Type synonym for a record type whose elements are arrays (columns)
type Frame (dt :: (Type -> Type) -> Type)
= (dt Vector)