I’ve been using pyspark and polars dataframes a lot. I think the appealing direction from me right now is creating a monadic DSL + expression language which build up a computation then build a query planner in top of that API. Then fork all the in memory functions into a new subdirectory where they will operate instead on this monad.
Still all in my head though. But would probably still require runtime checks.
I think something like what Laurent’s javelin (schema on read) might be a little better suited for the OLAP sort of uses but I think implementing this will be a fun learning exercise.