Data analysis/science in Haskell

emiruz · February 6, 2024, 7:57pm

I wondered if anyone has blog posts or personal experiences they’d like to share about using Haskell for data analysis or data science?

I’m a Haskell noob but data science is what I do professionally, and I feel a draw to applying one to the other. I don’t really mind coding fundamental algorithms from scratch, but one big stumbling block I see is data access libraries. E.g. working with GIS data, Parquet, images, and so on.

I also don’t see much support for the operational research side of things: mostly algorithms one cannot code alone. constraint solvers (GeCode), MIP/Linear programming, and so on.

Do you use Haskell for this sort of work? If so, how?

LaurentRDC · February 6, 2024, 8:42pm

I use Haskell as a quant, which is the silly name for data science in finance.

The ecosystem is definitely lacking in most of the exploratory niceties you would get using R or Python. I had to write a library to handle one-dimensional series because none were ready-to-use for my use case.

I also do linear programming in Haskell, and it’s really nice, especially in combination with type-safe dimensions. If you want to use GLPK, there’s hmatrix-glpk. We use an in-house library, which binds to a proprietary linear programming solver, so we haven’t bothered to release it (I’m also ashamed to say that it only works on Linux ).

It would be great to expand the ecosystem – maybe you want to take a crack at it? Best to identify a need you have, and work towards satisfying that need

emiruz · February 6, 2024, 9:39pm

hmatrix-glpk – last updated 3 years ago I’m seeing that quite a bit. I wanted an interval arithmetic library (Haskell has one) and a range valued sets library (Haskell has two), but its the same story: dormant. This doesn’t bother me that much. I’ve spent about a year with Prolog, and found it very useful for all sorts of use cases despite a lack of packages.

From the command line, one could create Haskell executables to do specific bits of analysis and orchestrate them with more specialised components:

Use GNUPlot for plots – interface by file io.
ETL in SQL + SQLite – interface by file io.
Discrete optimisation using MiniZinc – interface by file io.
SCIP for linear/MIP programming – interface by file io.
Makefile – for orchestration between bits.
Babel (orgmode), for literate programming.

I guess the theme with Haskell applied to data analysis/science will be “custom algorithms”

Haskell has noteworthy auto-diff and linear algebra libraries. That makes optimisers for regression and classifical algorithms fairly easy to hand roll.

ad-si · February 6, 2024, 11:01pm

I feel like most of the building blocks are already available.
It just takes a sub-optimal amount of effort to put it all together.

Here is a list of resources:
https://github.com/krispo/awesome-haskell?tab=readme-ov-file#data-science

And here is an example of it being more difficult than expected:
https://adriansieber.com/how-to-create-a-bar-chart-from-a-csv-file-with-haskell/
(OpenAI would have generated the Python + Matplotlib solution in 5 seconds )

olf · February 7, 2024, 7:48am

I have been using Haskell at work for data science-y purposes for ten years. Data science is a vast field, and as you observed, some niches are barely covered at all in the Haskell ecosystem. That said, occasionally someone uses the FFI to provide bindings to well-established tools. Recently Henning Thielemann added some new bindings to linear programming solvers because my company needed them.

That is likely either because such a library is mature or the user base is too small for changes becoming necessary. A better metric would be open issues/merged pull requests. I have released several libraries which haven’t been updated in a while since they are mature enough for me and nobody else has requested changes.

unhammer · February 7, 2024, 10:06am

I remember finding sky blue trades interesting, a (Haskell) blog series on Principal Component Analysis and such for weather patterns.

mihaimaruseac · February 7, 2024, 2:41pm

Around 6-7 years ago, there was a DataHaskell effort. Right now, the gitter room is almost abandoned

emiruz · February 7, 2024, 3:40pm

I haven’t got oodles of time at the moment, but soon enough I’ll try and redo this in Haskell. Its a custom – simple – property pricing model but it would require parsing, optimising, visualising, and so on.

I suspect the Haskell aesthetic can offer a unique perspective on the solution. I did a similar exercise with a data analysis in Prolog here, and I was very happy with the results.

emiruz · February 7, 2024, 4:44pm

I found HaskellR. Its ostensibly a seamless and highly performant interface to R from Haskell. E.g. here is a clustering example from the linked site:

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE ScopedTypeVariables #-}
import H.Prelude as H
import Language.R.QQ

import System.Random

main = H.withEmbeddedR defaultConfig $ do
  H.runRegion $ do
    -- Put any complex model here
    std <- io $ newStdGen
    let (xs::[Double]) = take 100 $ randoms std
    d  <- [r| matrix(xs_hs,ncol = 2) |]
    rv <- [r| clusters <- kmeans(d_hs, 2) |]
    [r| par(mar = c(5.1, 4.1, 0, 1));
        plot(d_hs, col = rv_hs$cluster, pch = 20
            , cex = 3, xlab = "x", ylab = "y");
        points(rv_hs$centers, pch = 4, cex = 4, lwd = 4);
      |]
    return ()

Wow…

ParanoidMonoid · February 8, 2024, 4:23pm

Can anyone give any insight into the “state” of HaskellR - I almost started using it a couple of times, but this answer has put me off: https://www.reddit.com/r/haskell/comments/li9muq/comment/gn5u4za/?utm_source=share&utm_medium=web2x&context=3

It seems to be get minor updates pretty regularly though… Code frequency · tweag/HaskellR · GitHub

unhammer · February 9, 2024, 8:46am

I see Jax there, you may be interested in Dex (by the same people)

thielema · October 26, 2024, 3:19pm

I can contribute packages lapack, comfort-blas, comfort-glpk, highs-lp, coinor-clp, comfort-fftw.

reuben · October 26, 2024, 11:12pm

I’ve used Haskell for toy versions (but useful ones) of various Bayesian inference algorithms (MCMC and particle filters in particular). Stream based programming, and functional reactive stuff worked out nicely here

Topic		Replies	Views
Haskell for Data Processing	14	2423	January 17, 2024
Haskell for Statistical Programming Learn	26	1217	January 19, 2025
Pre-HFTP: Proposal DataFrame Library for Haskell Haskell Foundation	11	854	December 9, 2024
[Design] Dataframes in Haskell Haskell Foundation	25	2876	January 6, 2025
Haskell for another areas	5	376	May 21, 2025

Data analysis/science in Haskell

Related topics