Deepseq versus "make invalid laziness unrepresentable"

Migrated from One Billion Row challenge in Hs - #184 by rhendric, since it is diverging from the topic.

I agree that deepseq is a tool with uses. Nonetheless, my claim is indeed very strong! There are few cases when I’ll make a claim that strong. There are so many complexities in the world of programming that I am open minded to the possibility that someone else can see something that I can’t. Regarding space leaks, however, I’ve investigated the area deeply enough for long enough (it’s been an interest of mine for probably ten years) that I am confident my conclusion is correct.

For comparison, another case I am confident about is that it is absolutely wrong to use Go-style product errors (Maybe Result, Maybe Error) and absolutely right to use sum type errors Either Error Result (another example of making invalid states unrepresentable). The difference between the case of space leaks and the case of errors is that using sum types for error handling has deeply percolated the Haskell community (and other languages besides) but making invalid laziness unrepresentable has not (I made a Twitter poll yesterday, and around 20% had heard of the notion).

To clarify my point, I was addressing the concern of a new user that Haskell experts cannot agree on the correct way to fix space leaks. It is really important that as a community we resolve this concern. One of the things that makes Haskell different from the vast majority of languages is laziness. I regularly see people dismiss Haskell because “laziness makes it impossible to reason about performance and space usage”. This claim is false and if we as a community fail to address it with a simple response then many people who could otherwise adopt and love Haskell will be put off from even trying.

The correct way to address all of the space usage aspect of that concern (i.e. space leaks) and part of the performance aspect is to make invalid laziness unrepresentable, just like if someone said “Haskell makes it impossible separate error states from normal return states” we’d say "no it doesn’t – make invalid states unrepresentable and replace your (Maybe Result, Maybe Error) with Either Error Result". Problem solved.

I did not say that force has no uses at all, nor did I say that making invalid laziness unrepresentable fixes all space leaks (although I’ve never seen a case where it couldn’t). I said that for the particular program in question the correct approach is to make invalid laziness unrepresentable, and, for that particular program, force is an incorrect approach (as is “sprinkling bang patterns”). I think that is the correct way to explain the situation to new users.

deepseq is an important tool to have at your disposal, even if just to try force-ing different things until you figure out which things to refactor.

Sure, that seems reasonable.

(But sometimes invalid laziness has to be representable; Haskell’s type system is very good at letting you express invariants through types but no type system is perfect at this.)

I don’t think I could ever refute such a general claim, although I’m not sure exactly what you’re referring to. Maybe Okasaki style data structures that use laziness in an essential way?

Yes, I’m happy to help. What’s the easiest way to reproduce the problem? Is it this comment? Is there anything simpler? I’m unfamiliar with Purescript, let alone familiar with building its compiler.

I don’t understand what it means for other languages to have deepseq built into their spec.

5 Likes

I’ve had a look and I can see a few issues from the point of view of “make invalid laziness unrepresentable”.

The space leak was fixed by forceing the listen of a Logger, so it’s clear the space leak exists in the Logger's state.

Now, warnings are reported by telling errorMessage or errorMessage' (for example). But tell does not evaluate what it is told (partly because it is normally told a singleton list), nor does errorMessage evaluate the message (see the definitions), so the ErrorMessages inside MultipleErrors will typically end up being thunks, holding on to whatever values they reference at their creation site. Even worse, ErrorMessage itself contains a list, plus a lazy SimpleErrorMessage, and the fields of SimpleErrorMessage are all lazy too.

I don’t know which or how many uses of tell/errorMessage(') ends up causing the problem, but here’s an example:

  makeResult :: ([[Binder]], (Either RedundancyError Bool, [[Binder]])) -> m Expr
  makeResult (bss, (rr, bss')) =
    do unless (null bss') tellRedundant
       case rr of
         Left Incomplete -> tellIncomplete
         _ -> return ()
       return $ if null bss
         then expr
         else addPartialConstraint (second null (splitAt 5 bss)) expr
    where
      tellRedundant = tell . errorMessage' ss . uncurry OverlappingPattern . second null . splitAt 5 $ bss'
      tellIncomplete = tell . errorMessage' ss $ IncompleteExhaustivityCheck

Because of the laziness of tell, errorMessage' and OverlappingPattern, tellRedundant keeps hold of all of bss' even though it only ends up using the first 5 elements. If bss' was very big (perhaps itself holding on to other things, due to its own laziness), then that’s a problem! I suspect that this particular case probably isn’t a problem, because bss' probably isn’t likely to be a long list, but I hope it’s illustrative of the fact that values from a long way away can be held onto for a long time because of laziness. The fact that force fixes the space leak strongly suggests that one or more of the constructors of SimpleErrorMessage is keeping values alive longer than necessary in a lazy field.

The fix from the point of view of “make invalid laziness unrepresentable” is to change Logger to tell things strictly, and to make ErrorMessage and SimpleErrorMessage have strict fields. There’s an additional complication that there are many lazy lists floating around too. It’s harder to know what to do about them. It depends on context.

Nothing so esoteric! By analogy with making invalid state unrepresentable, while it’s easy to statically exclude (Nothing, Nothing) by using Either instead of a pair of Maybes, and there are many other moderately more challenging cases where some more cleverness is required but sufficient to the task, in my experience it’s a rare Haskell program in the wild that doesn’t have a few cheeky fromJusts (or equivalent) here and there, because of at least one of two things:

  • Some invalid states correspond to things like an Int in one place in a data structure being less than one somewhere else, for which it’s difficult, in a non-esoteric way, to get a static exclusion. (Liquid Haskell and replacing Int with singletons for type-level naturals are examples of things I’m considering ‘esoteric’.)
  • Some invalid states are simply dynamic in nature rather than static—for example, after a user-provided filter is applied to a list of results, it’s invalid to have any element in that list that doesn’t match the filter. (Sometimes a package like justified-containers exists to address a teeny tiny corner of this space, and I love things like that, but if they generalize without a lot of case-by-case effort I haven’t learned how.)

The same bullets apply to making invalid laziness unrepresentable, I argue, and I anticipate that the PureScript example is going to be an illustrative example of at least one of them. And conversely, if you manage to show me a non-esoteric way to solve the PureScript problem without deepseq, I’ll consider that good evidence that taking an absolute stance against invalid laziness doesn’t have the same practical challenges that an absolute stance against invalid state often does.

I’m delighted that you want to take a crack! Sadly we never got a reproducing case from the reporter; having tested it now compiling the standard package set (as the comment you linked does) doesn’t demonstrate a difference in behavior with the NFData patch. We need code that generates a lot of specific warnings, and we never figured out exactly what those warnings are. So we’re stuck with reasoning about the code—where you’ve already made good progress.

Yup, that’s what I figure. Unfortunately SimpleErrorMessage can hold references to Expr and SourceType, both of which are recursive, lazy types that are extensively transformed with traversals throughout the compiler, and that’s the real problem. Making SimpleErrorMessage's constructors have shallow strict fields doesn’t keep an inner Expr from holding a giant thunk.

Assume we don’t want to make Expr and SourceType strict everywhere in the compiler. I think we in fact don’t want to do this for valid reasons but I haven’t recently done the investigation, and it would be a big change for a questionable actual advantage relative to NFData-ing everything and adding a force in the right place.

So here’s a situation where thunks in Expr and SourceType are valid in the bulk of the compilation pipeline but not valid inside SimpleErrorMessage constructors. We could, theoretically, make parallel ExprStrict and SourceTypeStrict types that are only used in SimpleErrorMessage, but the maintenance cost there is much larger and the traversal to convert an Expr to an ExprStrict is just as expensive as the call to force. We could also, I imagine, parameterize those types somehow with a type constructor that is instantiated as either a strict box or a lazy box, which has much the same problems (instead of having two near-identical types to maintain, we have a box that needs to be negotiated everywhere). Or we could store something else in those constructors, like the text to be printed that describes the expressions and types, but generating that text would need more state threaded into the core of the type checker and increase the coupling between parts of the system that should be disconnected.

And elegance aside, placing a force at the end of a worker thread’s task is a very direct way to express that we don’t want thunks surviving past the life of a task. It won’t miss anything, even if our analysis of the code is incorrect or if some aspect of the code changes in a way that would cause a regression on this issue. That’s a big argument in favor, IMO, though maybe th-deepstrict would be a suitable substitute for this in a static approach.

What would you do?

3 Likes

I know that the following is not a popular opinion, but I greatly value terseness of a language and how little I have to write down to achieve my intentions. If there are “strict” data structures which I can just use via import without further thought, that sounds great, but then presumably everyone would agree. However, if I have to write a significant amount of code to declare problem specific types purely to guarantee strictness – i.e. types to govern aspects of how computation is done – I would see it as bloat.

It is my (fallible) understanding that in the ideal of “total functional programming”, lazy or strict evaluation always leads to the same results. So it does seem to me that efforts to contain laziness would be explicitly about the “how” of execution rather than the “what”. I’d rather the compiler just did all that, or that the paradigm resulted in correct choices without my assistance.

3 Likes

Although I agree about the principle to separate the “how” from the “what”, note that it is quite hard for a compiler to predict whether it is more efficient to suspend a computation than to execute it immediately; and that such a prediction is fundamentally incompatible with separate compilation/function boundaries. Example:

(&&) :: Bool -> Bool -> Bool
a && b = case a of
  True   -> b
  False -> False

Should the compiler evaluate b strictly, at call sites, or lazily, at use sites? (As you say, in a total language, the compiler is free to do either.) Well, that depends on the particular b that you call (&&) with!

Compiler A might say “strictly” and in doing so does not need to allocate closures for parameter b in call sites like True && (c + d + e == 3) and simply pass a single Bool flag as b.
Compiler B might say “lazily” and in doing so needs to allocate a closure for the call site above mentioning the free variables c,d,e, as if the second argument was a nullary function.
But in turn, False && takesAMillionYears 42 does not need to evaluate takesAMillionYears 42, as Compiler A would.

Note that neither Compiler A nor Compiler B can anticipate function calls in client modules. What’s worse, client modules might have both call sites. There is no way to find a best pick if we are not willing to give up on separate compilation (which includes having exactly one definition of the function (&&))!

GHC is like Compiler B and works around excessive allocation of closures by aggressive inlining of functions such as (&&), even across module boundaries, but mayhem reins when it has to stop inlining (because of code size concerns or recursive functions getting in the way).

If performance is a goal, then the programmer must communicate evaluation order for the use sites of (&&). There are many ways to do so, but IMO the most elegant is through use of special types. This can be seen as making “invalid laziness unrepresentable”. Although I’d must rather see allthose <thunk>s as <closure>s, which IMO are the real problem.

2 Likes

I think optimistic evaluation provides an interesting alternative: for every place where we need to make a choice between strict or lazy evaluation, generate code for both options and first try strict evaluation but fall back on lazy evaluation if evaluating the value takes too long (and remember that decision for future evaluations).

I don’t understand what this means.

1 Like

I guess it falls under the broad umbrella of “planning”: enter JIT, AOT, PGO, etc. I’d supposed ideally I’d focus on “what” exclusively, but viewed on a spectrum, the best possible trade-off of “how” hints vs performance would do :slight_smile:

Nothing profound; a thunk producing an a is roughly a function () -> a and hence needs a closure, after all. But what is expensive about this representation is not the memoisation often associated with thunking, it is the closure, because that is what keeps its free variables alive, thus causing space leaks.

2 Likes

I was not aware of that work, thanks. Alas, speculative evaluation is only half of the story; in order to write high performance code, you also want to unbox by-value parameters so that nothing needs to be passed on the heap. That is impossible if you want to retain the ability to use call-by-need.

JIT-based specialisation might be an option, though.

Looking at Python libraries itertools, functools and the general support for generators, iterators, lambdas and partial functions; it seems to me that it is easier to emulate “lazy” evaluation in a “strict” language than the other way around.

How would you emulate the short-circuiting behaviour of Haskell’s (&&)/Python’s and in a strict language?

I guess what you wanted to ask was “how to emulate short circuiting in user defined functions/operators/…”?

1 Like

Yes indeed.


I’m not sure what this means. Do you mean:

A = some value output of a function
B = some value output of another function
if A and B:
    print ("ok")

If A is false then B should never be evaluated?

If so then the simple solution is :

A = some value output of a function
B = lambda: some output of another function
if A and B():
    print ("ok")

In that example, the stuff in B does not get executed.

1 Like

I see. So we simulate laziness in a strict language by wrapping things in lambda. And what if we want to do

if f(A) and g(A): ...

where we not only want to avoid calculating g(A) if f(A) is False, but we also want to avoid recalculating A if f(A) is True?

You mean this?

from functools import cache

A = cache(lambda: some value output of a function)
B = lambda: some output of another function

if A() and B():
    print ("ok")

Sure, something like that! So at this point, for me personally, emulating laziness in a strict language doesn’t look any easier. You may have different preferences, of course.

One of the big differences is that in a strict language it is the responsibility of the caller to arrange for things to be evaluated appropriately. In a lazy language it is possible to delegate that responsibility to the callee. It can be very nice to write (&&) and just have it work properly.

So here’s a closure-free form of non-strict semantics:

(…it just needs someone to work out how to implement it ;-)

You’ve inspired me to explore this a bit. I’m going to quickly smash out AoC2021 in functional-style Python here just to see what I end up with.

1 Like