I’ve considered making a data frame library in the past but I’ve never got round to it. If I were doing it I’d do it as an arbitrary collection of Vector
s wrapped in abstract data types that enforce the invariant that all the Vector
s are of the same length. So, for example
newtype Field a = HiddenMkField (Vector a)
newtype DataFrame a = HiddenMkDataFrame a
deriving Functor
fromVector :: Vector a -> DataFrame (Field a)
fromVector = HiddenMkDataFrame . HiddenMkField
zip ::
DataFrame (Field a) ->
DataFrame (Field a) ->
DataFrame (Field a, Field a)
zip = ... check same length and put in pair...
All functions that operate on Field
s must preserve the invariant.
This approach has the nice property that you can be very specific about the fields that are in your data frame, for example
data MyRecord = MyRecord
{ count :: Field Int,
name :: Field String,
weight :: Field Double
}
and you can also be completely dynamic about the collection of Field
s and the types of the Field
s, for example, DataFrame (Field Dynamic, Field Dynamic)
, or DataFrame (Map String Dynamic)
.
In order to write polymorphic functions over DataFrame
s you can use ProductProfunctor
s in roughly the same way that Opaleye does.