New blog post: Is Haskell fast?

I wrote this post to summarise whether Haskell is appropriate for high-performance code. I await your feedback!

https://arifordsham.com/is-haskell-fast/

4 Likes

A high-level program describes the problem , and gives the compiler more freedom to choose the best implementation.

Is there a simple example where GHC does this?

. Theoretically speaking, high-level code is easier to optimize than low-level code. It simply provides more information to the compiler about the programmer’s intent.

Unfortunately, my point was of a theoretical nature. Intuitively speaking,

ultimately, a perfect Haskell compiler will outperform a perfect optimizing C compiler.

A C program specifies one way of solving a problem, while a Haskell program can more easily be analysed and rearranged, and of the multiple possibilities, a sufficiently smart compiler can choose the best one.

I did not mean to claim that GHC can currently do this. I’m sure an example can be contrived that a naive Haskell program performs a particular task faster than a strightforward C program, but in the general case this is certainly not true.

It can be debated whether GHC (or its successor) will ever be sophjisticated enough, but I’d like to think it’s at least possible. I certainly don’t think it’s true that there is inevitably a price to pay for abstraction.

1 Like

The Benchmarks Game is worth mentioning. Judging from this data I’d say that we should pessimistically assume optimized Haskell programs to be slightly slower than optimized OCaml, Java, or Go programs and I’d call Haskell one of the slowest compiled languages.

Haskell also has more dangerous performance traps than strict, imperative, lower level languages, so I’d guess unoptimized Haskell code could fare even worse in comparison.

On the other hand, dynamic languages like Python or Ruby are very slow, so even with all that I’d safely assume Haskell to be in general 10-20 times faster than those, which is a lot!

2 Likes

@movzx would you agree that Haskell should be classified as a higher-level language than OCaml, Java, or Go, and rather at least comparable in terms of abstraction with Python or Ruby?

I’d say Haskell is higher lever than all of those due to laziness and purity. In Haskell you don’t have to think about order of evaluation, writing sequences of steps. It’s declarative, you write what you want to compute and don’t have to care how it is computed.

Strong advanced typing with inference and type classes also make it convenient to write highly generic and abstract code, which would be fragile and ugly in those languages (maybe not so much in OCaml)

1 Like

fwiw, I would not say that Haskell is higher level than OCaml, or not by much. I don’t know Haskell as well as I know OCaml–I’m pretty much a newbie with Haskell, but I know OCaml pretty well. I think I know Haskell well enough to make my initial claim above, though. Just my opinion, and I don’t think anything turns on who’s right.

Of course you can easily write imperative code in OCaml, and if you do, it doesn’t have to be sent off to the land of Monad. I don’t know if that makes OCaml lower-level. But even if it does, if you stick to the usual functional core of OCaml, I think Haskell and OCaml are comparable functional languages with respect to anything I’d call a “level”, for the most part.

I think that Haskell is more elegant and has a lot of conveniences that OCaml lacks. That’s what’s beautiful about Haskell. That’s why I am loving learning it (mostly). Not sure that counts as a level, though. I don’t see laziness as making a level. Well, I suppose that the fact that it’s trivially easy in Haskell to define infinite data structures makes it higher-level for those who like infinite data structures (as I do). You can do the same thing in OCaml, but it requires special libraries and it’s more trouble than it’s worth. So call that a level in one area of coding. Not everyone sees infinite data structures as useful, though, since no code ever consumes an entire infinite data structure.

Again, I don’t think any of this matters to anything, but I thought it might be useful to provide a variant response.

1 Like

@mars0i as you say, the discussion over whether OCaml or Haskell is ‘higher level’ is mainly academic. Disclaimer: I have no familiarity with OCaml, but if OCaml code compiles to run faster than Haskell, as @movzx claims, for a similar level of abstraction, as @mars0i asserts, that just goes to prove my intended (language-independent) takeaway in the blog post, that abstraction need not have a performance penalty. I’d like to see programmers and languages moving to higher levels of abstraction, regardless of technologies or paradigms. That is indeed what is happening with the popular dynamic languages, and Rust is bringing the revolution to systems programming.

I think most of the efficiency loss in Haskell is due to laziness. Not for small toy programs where the Simplifier (and strictness analysis in concert with e.g. arity analysis) is able to see where the programmer actually needs laziness and where it doesn’t, but in the bigger context.

Laziness by default means that you are telling the compiler “Oh, I might want to be lazy in this place later on” in every place, just to gain laziness in very few key examples that then compose well. Postulating the “optimal compiler” and then trying to approximate it with a computable function that actually terminates timely is a research topic that I find great joy in. It’s amazing how fast Haskell programs are! But ultimately I’d rather have a strict language with proper support for laziness if I could start anew.

Beyond (or below) that, there is another choice that IMO makes GHC generate slower code than it could: Its STG machine is programmed out quite indirect. Most importantly, it maintains two stacks: The OS stack, which contains spilled registers (I believe!), etc. and the STG stack, which is heap-allocated. Other languages (like Lean, I think) manage to combine the two, which allows an e.g. LLVM backend to be much more effective and also reaps caching benefits from modern CPUs, which treat the OS stack specially. Not so GHC: Because the STG stack is heap allocated, it can cheaply implement green threads and all the RTS niceties we are so used to which have no comparison in other languages. I’m not sure if it’s possible to put the STG stack on the OS stack and still have those niceties.

TLDR; I’m amazed at how fast GHC-compiled Haskell is, given all the theoretical limitations to what a compiler like GHC can do.

2 Likes

GHC has {-# LANGUAGE Strict #-} that makes everything strict by default. Is that not enough? I have personally never seen a big difference in performance with that extension, maybe GHC is missing optimizations for strict code? Or maybe I’ve only tried it with small toy examples… but I feel like performance in larger programs is not dominated by laziness but rather some other bottleneck (e.g. I/O or specific algorithms) unless there is a significant memory leak of course.

1 Like

The problem (or rather insufficiency) of -XStrict is that it doesn’t influence imports or any of your clients.

Wrt. imports: Use [a] somewhere and you are doomed.

Wrt. to exports: Take foldr for example:

foldr' f z [] = []
foldr' f z (x:xs) = f x (foldr' f z xs)

If you compile this, even with -XStrict, you get something like the following Core after optimisation:

foldr'
  = \ @ t_atW @ t1_au0 ds_duB z_agh ds1_duC ->
      case ds_duB of ds2_XuI { __DEFAULT ->
      case z_agh of z1_Xgp { __DEFAULT ->
      case ds1_duC of {
        [] -> z1_Xgp;
        : x_agk xs_agl -> ds2_XuI x_agk (foldr' ds2_XuI z1_Xgp xs_agl)
      }
      }
      }

Note the complex argument to ds2. This will build a thunk for the recursive invokation, because the compiler can’t know whether ds2 is lazy or not. It certainly could be lazy in a client module without -XStrict. Thus, you see after CorePrep (which makes all the allocation explicit by let-binding non-trivial arguments to get something called A-normal form):

foldr'
  = \ @ t_atW @ t1_au0 ds_sBM z_sBN ds1_sBO ->
      case ds_sBM of ds2_sBP { __DEFAULT ->
      case z_sBN of z1_sBQ { __DEFAULT ->
      case ds1_sBO of {
        [] -> z1_sBQ;
        : x_sBS xs_sBT ->
          let { sat_sBU = foldr' ds2_sBP z1_sBQ xs_sBT } in
          ds2_sBP x_sBS sat_sBU
      }
      }
      }

Granted, -XStrict should probably imply adding seqs to arguments, too, but currently it doesn’t. Maybe that’s worth a GHC issue or proposal.

There’s nothing wrong with GHC optimising strict code. You’d be surprised how much code GHC finds to be strict (otherwise performance would be unbearable), but there will always be code which it doesn’t find to be strict.

Edit: There you go https://gitlab.haskell.org/ghc/ghc/-/issues/19022

3 Likes