Thanks for your feedback. Let me try to address the questions:
Hmm, comparison with pandas. That’s maybe a good thing because pandas is established. But I also wonder how it will perform memory-wise. My typical problem with data analysis in Python is that it will OOM because it tries to cram the whole dataframe in memory. But Haskell is lazy, so it will maybe not suffer that issue.
Is this time series actually lazy? Can I add new values efficiently while dropping or aggregating on the other (old) end?
What’s the data structure behind Series
? Is it just a vector and a map with keys → indices?
This first implementation is based on having keys stored in a Set
, and values are stored in a Vector
. Having keys in a Set
allows for fast membership testing and searching, while values in a vector allows for fast numerical operations.
This structure isn’t efficient for certain things; for one, changing the shape of a Series
is very slow, because you need to copy the entire array. If you operate solely on the values of the series, then maybe Vector
's fusing mechanism may make things a bit faster, but in general, this is very much a ‘dense’ in-memory representation of a series.
One thing that I have seen in polars (rust dataframes) is separate lazy and strict APIs. It would be great to do something like this in javelin
, but what I needed now for work is the strict API, so that’s what I built first.
Let’s read the tutorial. Ah, what’s >>>
supposed to mean? Is this really a real GHCi session or documentation?
If you use HLS, the >>>
will open a real GHCi session indeed. The javelin
project makes heavy use of doctest to check that the documentation is perfectly aligned to what you would get in a ‘real’ GHCi session, and doctest
is triggered by >>>
I really hope the show
instance is not the table display. Sigh, it is.
I’ve done this to keep the documentation examples as simple as possible. I’m converting coworkers who have never used Haskell before, so it was important for me that the examples were as easy as possible to reproduce. Note that you can customize the printing of series using the displayWith
function.
What else would you have wanted from the Show
instance? You could have it be like Map
, but I personally find this much harder to read
Wait, how do I do numerical integration? Any other time-aware filter like exponential moving average? Linear/Bezier/… interpolation? Where do I ever use that the key of the series
In my typical application, I never just have one time series. I have a lot of them, and they are indexed by e.g. the data source. How do I deal with that here? I guess at the end I do need multi-dimensional keys, but I don’t understand whether I can do this here.
I guess I need a bigger full fleshed time series analysis example, maybe a Kalman filter or frequency analysis with multiple data sources, before I can judge whether I can use this.
The javelin
package doesn’t contain anything specific to time-series just yet, although some functionality applies to time series (e.g. you can use the windowing
function for rolling aggregations which I use a lot.
My plan was always to have a separate javelin-timeseries
or similar which contains time-series specific functionality. I don’t have a need for it right now, so it doesn’t exist, but I would be very interested in your opinion on what should go in there.
Would you be willing to open a GitHub issue containing feature requests for time-series functionality? That would be very helpful. Or you can post them here