I have a working prototype of a dataframe library. @LaurentRDC suggested I work on a proposal for this to send to the Haskell foundation some time ago but it seems the only time I get large blocks of time for personal projects is over the holidays.
I’m seeking initial feedback on the approach and some possible future directions.
Where does this library fit into the design space?
I think it’s good to have a library that allows you to go from “I have a dataset” to “oh, this is what this data is about” very quickly. As such, this library prioritizes simplicity where possible. A few design decisions in particular:
- An API that is reminiscent of Pandas, Polars, and SQL
- Dynamic typing (which also incidentally gives more control over the error messaging - GHC’s errors can be a little intimidating)
- Use in GHCI/notebooks/literate programming rather than standalone scripts
- Terminal-based plotting so users don’t have to have all the right lib-gtk/sdl libraries installed.
I’ve included some future work in the README that highlights things I’d like to work on in the near to medium term.
Once the large questions are settled I’d also like to do more UX studies e.g survey data scientists and ask them what they think about the usability and ergonomics of the API, and what feature completeness looks like.
But before all that welcoming initial feedback - and maybe a look at the code because I think there is a lot of unidiomatic Haskell in the codebase (lots of repetition and many partial functions).
After getting feedback from this thread I’ll work on a formal proposal doc to send over. Thanks. Will also cross post for more feedback.