Strict, StrictData, UNPACK

tomjaguarpaw · March 24, 2024, 10:26am

Forking off a thread from @vshabanov’s comment One Billion Row challenge in Hs - #223 by vshabanov

I don’t feel too strongly. Less experienced users might be better off with -XStrictData but then there’s also the risk they then don’t learn to use !.

Yes, and this is explained further in the User’s Guide. But firstly, I don’t think there’s much risk of UNPACK happening by accident, and secondly unboxing is not that closely related to laziness. There are morally two steps between something unboxed and something lazy: 1. unboxing, 2. lifting.

I don’t understand this. I expect that making fields of data types strict to be better for performance in the vast majority of cases. There are exceptions when you can avoid a redundant calculation by returning a value in a lazy field, but I expect such cases to not arise much in performance critical code anyway.

Do you have some examples you can share of strict fields making performance worse (where the worsening cannot be attributed to redundant computation)?

Who are they? For the record, I am not one. I think lazy function arguments are wonderful. My current expectation, though, is that strict fields of data types are the correct default choice in the vast majority of cases.

Despite valuing lazy function arguments, I haven’t been able to understand Ed’s insistence that what he wants can’t be achieved in a strict language with explicit laziness. Okasaki’s book was written in such a language, after all.

atravers · March 24, 2024, 1:32pm

From page 26 of 33 in How to Declare an Imperative (1997)
In order to treat 'a io as an abstract type [in SML], we define a fixpoint operator, fix.
```
val fix     : ('a io -> 'a io) -> 'a io
fun fix h   = let fun f () = h f () in f end
```
From More points for lazy evaluation (2011)
The ability to define new functions that can be used as control constructs is especially important when you want to design embedded domain specific languages. Take the simple example of the when (i.e., one-arm if) function in Haskell.
```
  when :: (Monad m) => Bool -> m () -> m ()
```
A quite common use of this function in monadic code is to check for argument preconditions in a function, like
```
  f x = do
      when (x < 0) $
          error "x must be >= 0"
      ...
```
If the when function is strict this is really bad, of course, since the call to error will happen before the when is called.

Again, one can work around this by using lazy values, like
```
  myAnd :: MyBool -> Lazy MyBool -> MyBool
  ...
  ... myAnd x (delay (\ () -> y)) ...
```
But in my opinion, this is too ugly to even consider. The intent of the function is obscured by the extra noise to make the second argument lazy.

vshabanov · March 25, 2024, 12:10am

I regularly have to dig into a large Haskell/Mu codebase. Seeing a data declaration and not knowing that it’s strict can be quite misleading. In general, having a LANGUAGE pragma somewhere invisibly affecting the code is not good for maintenance (but good for quick experimentation).

It explains that you need -O. But reboxing can happen with -O2. Not every function can be inlined. I’ve seen increases in allocations due to UNPACK.

I added it because it looks like a similar silver bullet that makes code more performant, but it doesn’t always help.

How much of the code is performance critical? I would say that for an average webapp, or maybe even a compiler, the majority of the code is not performance critical. And I would argue that redundant computation is the norm rather than the exception.

No (except when paired with -funbox-strict-fields), I’m mostly pointing that redundant computations are pretty frequent.

Another pitfall is using non-strict data as strict fields. field :: !(Maybe foo) won’t help much. One needs to use strict data all the way down, which is not that convenient.

I’ve seen several messages about strictness and performance in the original thread, and warned that it’s not as simple as making everything strict.

I think there is a single rule of thumb: “is it an accumulator? make it strict”.

Other cases are more nuanced. If it’s an aggregation – the result data is smaller than the source data AND we don’t want to keep the source data AND we are fine with always doing the aggregation – ok, either force evaluation or maybe make it strict.

If the data is created once and not updated recursively, the need for StrictData is not as strict, and the gain from not evaluating is greater.

YMMV, if you are dealing with numeric code, or have a codebase that is prone to space leaks for some reason, then you may need to enforce StrictData (and use StrictMaybe, StrictEither, etc). But this can make ergonomics worse.

There are alternative approaches like spine-strict data, deepseq (inefficient “traverse it twice” hammer, but can work when there’s no time to find a leak), seq at the point of creating a big thunk that is then put to a lazy field. Lots of them.

Good point. I have been programming in OCaml for several years. It’s strict and has lazy values, and I would say that programming in a lazy-by-default language feels much more pleasant (not sure how a lazy language with strict data would feel, though).

Ed’s take includes the modularity part as well. The performance part can be reproduced in a strict language (OCaml is faster than Haskell in many cases), but modularity (with performance) is more challenging to achieve.