[ANN] dataframe 0.1.0.0

mchav · April 7, 2025, 5:20pm

I’ve been working on this for some months now and it’s in a mostly usable state.

Currently only works with CSV but working on parquet integration since that’s what I mostly use at work. There are small tutorials in the Github repo.

Hoping to have it be more feature-rich after ZuriHac.

Thanks,

Michael

LaurentRDC · April 7, 2025, 6:44pm

Congrats Michael! I know you’ve been thinking about and working on this for a long time. Looking forward to trying it out

roqueando · April 7, 2025, 8:41pm

Nice!! I was waiting for this a long time, congrats Michael that’s a nice work!

mchav · April 8, 2025, 2:11am

Thanks Laurent! It’s been really useful for ad hoc CSV crunching at work. Have you had time to work on javelin dataframes recently?

mchav · April 8, 2025, 2:12am

Obrigado Vitor! Lemme know what you think and what’s missing.

LaurentRDC · April 8, 2025, 2:16am

Unfortunately, no. Ironically, I changed jobs and won’t need to use dataframes for a while

ocramz · April 8, 2025, 5:03am

Very nice! Looking at your dev notes, do you have opinions already on how the lazy (graph-based) API would work?

mchav · April 8, 2025, 6:44am

I’ve been using pyspark and polars dataframes a lot. I think the appealing direction from me right now is creating a monadic DSL + expression language which build up a computation then build a query planner in top of that API. Then fork all the in memory functions into a new subdirectory where they will operate instead on this monad.

Still very vague but something like:

df <- readParquet "./file.parquet"
small <- deferred df >>= apply (col "field" :+: lit 3)
                     >>= filter (col "field" :<: lit 10)
print $ runEval small

Still all in my head though. But would probably still require runtime checks.

I think something like what Laurent’s javelin (schema on read) might be a little better suited for the OLAP sort of uses but I think implementing this will be a fun learning exercise.

Topic		Replies	Views
Pre-HFTP: Proposal DataFrame Library for Haskell Haskell Foundation	11	854	December 9, 2024
[Initial feedback request] DataFrame library Show and Tell	24	1331	February 2, 2025
Haskell for Data Processing	14	2422	January 17, 2024
[Design] Dataframes in Haskell Haskell Foundation	25	2876	January 6, 2025
DevOps Weeky Log, 2024-03-20 Haskell Foundation	0	569	March 20, 2024

[ANN] dataframe 0.1.0.0

Related topics