Performant haskell seems like a real pain

I’m learning haskell, and there has something that has been bothering me about naive vs. advanced haskell.

I came across this great somewhat old blog post (but I am able to repro with a ghc 8.10.1).

The gist is that optimizing for speed and memory usage is not always about strictness. The author gives an example of calculating the mean of a sequence of numbers. Once the list of numbers grows large (1e8), the naive implementation exhausts memory and crashes. It’s only upon looking at the Core output that the illuminating detail is found: that the garbage collector can’t clean up a list (even a lazily evaluated list) if that list is used in a subsequent computation. So, sum myLazyList / length myLazyList fails because myLazyList stays in memory, because the compiler assumes that it can’t clean up during the sum because length also needs the list.

The (first) solution, which the author says is the bread and butter of functional programmers for 40 years, is to do the sum and the length together:

mean :: [Double] -> Double
mean = go 0 0
        go :: Double -> Int -> [Double] -> Double
        go s l []     = s / fromIntegral l
        go s l (x:xs) = go (s+x) (l+1) xs

My question (finally) is that even for this seemingly trivial example, the details of an efficient solution seem to only be revealed when looking at Core output, and reasoning very carefully about what is happening. This implies (to me) that for much larger programs, finding the performance problems seems like it would be a very difficult undertaking. Is writing performant haskell really this challenging?

Are there tools that would have revealed something like the above quickly, or at a unittest level? Can discrete parts of a large haskell program be thusly performance tested?

Can the naive/advanced patterns (the above is an example) be mapped to the same kind of naive/advanced patterns of other programming languages? In other words, is it more/less/same difficulty in optimizing, say, C++, if one were new to it?



For this small example I would say it is a question of experience. Like the author of that blog post says, laziness is not the issue, computing the sum and the length together is also something you would have to do in a strict language such as C++.

You don’t really need to look at Core, if you still use something later on then it of course cannot be garbage collected, but Core does make it immediately obvious. But I can imagine that these kinds of performance problems can sneak in unnoticed in larger programs.

A pretty good solution to this problem are linear types (coming in GHC 9; here’s a talk from SPJ about it) that allow you to express that you think the list should only be traversed once (and hence it should be possible to garbage collect it during the traversal). But that can only specify linear constraints.

I don’t know what the latest syntax is, but something like this should give errors when compiled:

mean :: [Unrestricted Double] -o Double
mean xs = sum xs / length xs

While the solution should compile without errors:

mean :: [Unrestricted Double] -o Double
mean = go 0 0
    go :: Double -> Int -> [Unrestricted Double] -o Double
    go s l [] = s / fromIntegral l
    go s l (Unrestricted x : xs) = go (s + x) (l + 1) xs

More general type systems that specify these kinds of constraints are called resource types which can also be used to specify quadratic or exponential time (or space) constraints. There was a paper at ICFP this year and a video about it from the Chalmers functional programming Seminar.

Another thing you can do right now is to try and write it using foldr without manual recursion. That also ensures it is done in a single traversal:

mean :: [Double] -> Double
mean xs = foldr (\x go s l -> go (s + x) (l + 1)) (\s l -> s / fromIntegral l) xs 0 0

And if you really want to be able to write that but nicer then you could consider using attribute grammars. Although these again will only work for linear algorithms. There is very similar example in the Why attribute grammars matter post. This is the final program that you would write:

-- This could be defined in a library
type AverageList = [Double]

-- This is the equivalent of the length function
attr AverageList
  syn length :: Int
sem AverageList
  | Nil loc.length = 0
  | Cons loc.length = 1 + @tl.length

-- This is the equivalent of the sum function
attr AverageList
  syn sum :: Double
sem AverageList
  | Nil  loc.sum = 0
  | Cons loc.sum = @hd + @tl.sum

-- This is the important part. It looks very much like the naive Haskell implementation.
attr AverageList
  syn avg :: Double
sem AverageList
  | * loc.avg = @loc.sum / fromIntegral @loc.length

{ -- This is needed to actually create a function in the Haskell world
mean :: [Double] -> Double
mean xs = avg_Syn_AverageList (wrap_AverageList (sem_AverageList xs) Inh_AverageList)

That is quite a bit more code for this simple example, but it is modular and most of it could be defined in a library like the sum and length functions.

1 Like

I feel that the example presented here is not an issue of Haskell at all, but is a general issue all programming languages have.

In any language you would have to think what exactly is the input to the function mean, in Haskell [Double] represent a list that might or might not be fully materialized, in other languages LinkedList<Double> would always be fully materialized. In that case the creating the list will exhaust the memory before entering the function anyway.

Also, thinking about what we do with the data is also encouraged, even in C++ we would traverse the list twice instead of once, and we should be aware of that. The fact that Haskell is able to consume a list without fully materializing it in some cases is a nice bonus optimization over what C++ can do, but the default behaviour should be considered the same for both language.

So - one must think about the data structures they operate on in any language, Haskell included. I feel Haskell can be unnecessarily hard for beginners in this regard because we often neglect to mention that fact! Choosing the right algorithm and data structure for the task is really meaningful for good performance.

However, the nice thing is that Haskell does allow us to create abstractions and retain a high-level algorithm description while still traversing the list only once! check out this implementation of mean using the foldl package that allows us to compose folds:

import qualified Control.Foldl as Fold

mean :: [Double] -> Double
mean = Fold.fold $ (/) <$> Fold.sum <*> fmap fromIntegral Fold.length

main = print (mean [1..1e9])

It looks very similar to the original solution, but traverses the list only once and therefore only evaluates what is needed.

1 Like