Do you people used Haskell for statistical programming? Some statistical model approach and data analysis?
For example I’m creating a project to build a ML algorithm, like regression or these statistical models, for a rain prediction. For collecting data and cleaning I created a python script just to do it for me, all feature engineering stuff I’m doing in Haskell.
Non-diffusive atmospheric flow by Ian Ross was an interesting blog tutorial series on analysing weather patterns with Haskell
Benjamin Redelings created a Probabilistic language with some feedback from the Haskell Community. Embedded into Haskell is the LazyPPL library. If you want to reach for a more established statistical Language without losing the expressiveness of Haskell, look at HaskellR.
I tried to do the older version of the machine learning Coursera course in Haskell.
You could make it work but you have to roll up your own everything. And there were some things that didn’t exist at the time e.g a good library for convex optimization.
I haven’t surveyed the landscape in a while but I’m about to start looking into again since I’m now content with the primitives in my data frame library.
I was reviewing you dataframe lib, I would like to contribute but I’m newbie in Haskell (for more advanced things like concurrency, parallelism and others) hahaha
I’d welcome a contribution. I don’t plan to put it on hackage until early March and until then there’s a lot of cleaning up and testing to do, tutorials and documents to write, design choices to reconsider etc.
Absolutely. I use most of my work in Agda these days, but I still use Haskell for statistical programming a bit. For example, I wrote a Haskell package for M-estimation. At a previous job, I wrote a DSL (in Haskell) for creating datasets for epidemiologic analysis. Here’s a fork of that project.
That’s is freaking awesome! I’m studying statistics and Machine Learning and I love haskell, so given my abilities with software engineering, and functional programming, I was trying to apply these studies. Thanks for sharing the projects, I’ll definitely read.
I’ll also add my two cents here. During my university times I used Haskell extensively for estimations with MCMC algorithms. If you are interested, have a look at mcmc, and, in particular, McmcDate.
However, I have to admit the journey was not always easy. In my opinion, the R and Python ecosystems are better equipped when it comes to sampling from or calculating values of probability functions. The Haskell statistics library is a great tool, but seems a bit un- or at least under-maintained. Also, it is missing some functionality: For example, I had to implement the multivariate Dirichlet distribution, which is a pretty standard function. (Actually, multivariate distributions in general, are a bit of a hassle).
I am not sure where you want to go, but the upside of using Haskell, in my opinion, is that you will really understand what you are doing. Further, we need people such as you working on statistical stuff, so we can improve the libraries and the ecosystem. There is a lot to do!
Exactly! I know that R and Python are the kings, but why not create some data pipelines with haskell and use like HaskellR to do some stuff within stastistics. Haskell can be a nice tool for data pipelines in statistical programming, and we can make it happen! I will try to contribute more this year in this community.
I found the viewpoint of a functional programmer helped tremendously when trying to make sense of the mess that is statistics. “What is the type of this?” was always an entertaining question to ask. On the other hand, the practicing statistician appears to be entirely unimpressed by type theory or categories. The fact that distributions form a monad does not make your statistics stronger. Moral of the story: If giving structure to your statistical analyses is not your major concern, Haskell may not buy you any advantage.
Agree, but when I want to create a data pipeline program that I want to not have bugs, well compiled and just working when finished. Types will be very needed. Statistics is the subject to me that I want to apply and apply in a real-world case
Then not implementations of statistical algorithms in Haskell is what we are looking for in this thread, but rather libraries that cast statistics in types and type classes alongside bindings to well-established number crunching libraries written in C or whatever.
I’d be interested in what kinds of bugs in statistical computations can be prevented by strong typing. From my limited practical experience, shooting yourself in the foot with statistics is most easily done by selecting an inappropriate test/algorithm/model. How would types help you with that?
I think talking about statistics + computation with software engineering, for me Haskell is great. I already apply statistical programming in python so why not Haskell? And Types will help me in building software. Not everything is about theories.
There are a number of cases where an implicit conversion, mishandled data import etc. — so the risks that come with what makes programs like R handy — can sway an otherwise sound analysis.
The crux of the question I feel is avoiding these errors without losing much convenience.
That could actually be taken as an argument against (dynamically typed) data frames and their valuation as a Haskell Foundation project. At least the docs of such a library should have disclaimer: Use only for exploration, discard when modeling!
This week, next week or the week after?
To discover that you’ve used the “wrong” type several weeks, and several dozen type signatures later…it’s a real de-motivator having to change all those signatures.
There is no “standard” system of types - that each language has to have its own FFI makes this abundantly clear. Consequently, each language usually has it’s own system of types i.e. each of those systems of types is an artefact of the associated language.
So is it of any great surprise that Don Knuth, for example, decided to design his own assembly language (twice) to avoid all of that language-based bureaucracy in his extensive literature on algorithms? Or that the practising statistician appears to be entirely unimpressed by type theory or categories (or artefacts of category theory)?
Do the majority of mathematicians want to “entertain” something so utterly language-specific as a system of types? If the ongoing popularity of Python and other dynamically-typed languages in mathematics and science more generally is any measure…I think not.
Then it’s fortunate that Simon Marlow’s dynamically-typed hierarchy of exceptions arrived in Haskell before the advent of the HF.
It’s being maintained. Or rather it’s being maintained and that’s it. I haven’t seen active development in… many years.
Mathematicians do entertain something like a system of types, even if they seldomly call it such. A probability distribution is distinct from a density function is distinct from a linear operator is distinct from a matrix. Of course any mapping into any programming language must be somewhat idiosyncratic. Being as faithful as possible ought to aid statisticians in designing the right algorithm.