What about having StrictData and non-strict semantics on functions as the default programming style?

Hi all

I have seen lexi-lambda’s part 4 video on laziness. On there, there is a safeDiv function that was not optimized as much as it could because it returned a Maybe Double. The demand structure was hidden inside the Just constructor.

So the author goes on to establish some rules. She does not really want to hide some computations in constructors, but given lazy binding in data types, that is what she gets by default. On the previous videos she made the point that non-strict semantics are pretty great for code optimization. They let the compiler move stuff around in a semantics preserving way. But it does need a good demand structure for it to work well. More thunks being the result of the demand structure not being clear enough, instead of just when the computation actually had to be suspended.

So she establishes some guides:

  1. Function should be lazy, so the demand structure is just the data dependencies
  2. But most data type types (tuples, either, maybe) could be WHNF strict on the values (as with strictData)
  3. Except on the case they are recursive (List, Seq, Map), in which case you need to do a case by case analysis. List and Seq should be lazy. Map needs to be at least spine strict.

I have been playing with this style and it is pretty great. Long lived data structures become “sync points” as she calls them. They are either completely suspended or completely forced. You can use strict-wrapper to convert between representations with ease. The produced STG is great

But what do we lose? Well:

But you keep:

  • Lazy let and where bindings on function bodies
  • Lazy list as streams
  • Higher order functions for code reuse

So I want to hear more opinions on this programming style. I would go as far as to propose to including strict versions of Maybe, Tuple and Either on base. What has been your experience? Completely against? let me hear what you think!

9 Likes

As I vaguely recall, the KAOS system used something like that for processes:

  • inside each OS process, functions and constructor could be lazy;

  • but between processes, hyperstrictness was mandated in order to keep unevaluated suspensions (a.k.a thunks) from leaving its process of origin.

On the “other side of the ledger” is the old functional language Hope - it’s also a vague recollection; I believe it had strict functions and lazy constructors (though one Ross Paterson implemented an all-lazy implementation of it as well). This approach is also supported by Maarten Fokkinga in his thesis: see section 6d for his conclusion (on page 147 of 169).

Yeah, that would be nice.

I personally enable StrictData globally for every non-toy project, made State and Writer in effectful keep values in WHNF by default and if I need IORef or MVar in nontrivial amounts, I tend to define newtype wrappers over them that after each modification force the value to WHNF.

Over 10 years of writing Haskell I’ve seen way too many space leaks caused by overly lazy data structures or because someone used modify or modifyIORef somewhere instead of a strict equivalent. I don’t personally think I’m missing anything valuable :slightly_smiling_face:

8 Likes

From what I understand strictness flags on constructor fields do two separate things:

  • Demand every field whenever a constructor is evaluated (you can do this same thing by just using seq manually);

  • Remove a case on field access because it doesn’t have to pattern match against a thunk as it knows the field is already evaluated (you cannot do this thing manually, the operation is very much unsafe).

Thus in terms of features a fully lazy datatype is way better than a strict one, you can simply opt into strictness when you see fit.

For the vast majority of projects what was said in that video will not matter much. As long as your program runs correctly and you don’t hang hundreds of megabytes of data for no reason, it really doesn’t matter if it allocates a few dozen thousand more thunks and runs 10ms longer. The tradeoff for the power you get when writing code is near-inconsequential.

Map needs to be at least spine strict.

Map is already always spine strict, Data.Map.Lazy is just a strict dictionary that stores values as thunks. This was done because maintainers don’t want to duplicate code, however the fully lazy variation wouldn’t actually be useless: a fully lazy radix tree (which uses lazy dictionaries inside) allows argument parsing without evaluating the entire argument tree.

opinions

There’s nothing wrong with forcing evaluations as you see fit, that’s exactly how you should reason about your code. Any persistent data structure obviously needs to be evaluated properly, if you don’t do this you’re leaking memory. However even in persistent data structures there’s still room for little lazy things (like difference lists) and going haywire with strictness bars you from exploring that niche.

Could you explain in what way these things are different? They seem the same to me.

The first one is syntactic sugar for when you construct a thing.

Strict a b   ~   a `seq` b `seq` Lazy a b

The second one is a magic tag in STG applied on pattern matching.

foo (Lazy   a _) = a   ->   case a_0001 of
bar (Strict a _) = a   ->   case a_0002<TagProper> of
1 Like

Ah, perhaps I understand. To me I think of it as “because the first has happened then we are also allowed to apply the second (which is an optimization)”. Does that match your understanding?

2 Likes

Yes, making invalid laziness unrepresentable[1] is essential for avoiding space leaks whilst keeping your sanity, in my opinion. Anything that you have to remember to do (for example bang patterns on function arguments, strict let bindings, applying functions with $!) for correct program behaviour imposes a large mental cost. I want to minimize the amount of stuff I have to remember. That’s why I use Haskell after all!

[1] I even wrote an article about it: make-invalid-laziness-unrepresentable

3 Likes

Not “the first thing has happened”, but instead “the first thing is guaranteed to have happened based on the datatype definition”. I haven’t checked any further, but I expect GHC to apply this tag every time it can prove prior evaluation, and if true for those cases it would directly be “the first thing has happened”.

Alternatively, we could use an implementation that has no (heap) space to leak e.g. figure 9 on page 24 of On Inter-deriving Small-step and Big-step Semantics: A Case Study for Storeless Call-by-need Evaluation (2011). Now if that could be implemented in GHC (or some other Haskell implementation), I believe it would greatly reduce the need for strictness annotations…and the need for “semi-duplicate” types with varying levels of strictness.

That we have “semi-duplicate” definitions for varying levels of imperativity (e.g. map and mapM) already provides more than enough annoyance!


Erratum: I got the year wrong. 2011 was when the preprint was submitted - the paper was actually published in 2012 >_<

1 Like

That article starts strong:

A famous functional programmer once was asked to give an overview talk. He began with “This talk is about lazy functional programming and call by need.” and paused. Then, quizzically looking at the audience, he quipped: “Are there any questions?” There were some, and so he continued: “Now listen very carefully, I shall say this only once.”

Definitely have to check that one out :slight_smile:

2 Likes

So far we have considered types like

data Maybe a
  = Nothing
  | Just a

and

data Maybe2 a
  = Nothing2
  | Just2 !a

But what about types like

{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE UnliftedDatatypes #-}
import Data.Kind
import GHC.Exts

type Brut = TYPE (BoxedRep Unlifted)

type Maybe3 :: Type -> Brut
data Maybe3 a
  = Nothing3
  | Just3 a

and

type Maybe4 :: Type -> Brut
data Maybe4 a
  = Nothing4
  | Just4 !a

?

That’s an interesting reference! Although I doubt that a storeless approach will actually need less space than a … storeful machine with a garbage collector. Furthermore, by storing all heap bindings on the stack, it’s quite complicated to reclaim dead bindings. Well, you could copy collect the whole stack or relink stack frames… But the fundamental problem with space leaks is that some thunk retains a huge working set – nothing about moving the heap into the stack will change that.

1 Like

…for actual node data, I would tend to agree - it’s (most of the) metadata usually associated with GC that could probably be elided, thereby reducing the overall size of those thunk and their working sets in memory.


But the fundamental problem with space leaks is that some thunk retains a huge working set […]

What would be interesting is to try extending the approach with parallelism (i.e. multiple stacks) to observe what happens (or can be done) which such thunks:

  • would those thunks and associated working sets be confined to a few stacks?

  • if so, the tasks with those overly-sized stacks could be given a temporary increase in priority in an attempt to complete them, thereby freeing their stack space.

But this is all conjecture, and there’s been no shortage of that for heapless (or GC-less) approaches to lazy evaluation in the past. That we (in Haskell :-) are all still using GC-managed heaps would appear to suggest that suitable approaches (assuming they exist at all) have yet to be found for production implementations (like GHC).

Huh, so this is UnliftedDatatypes (GHC 9.2+) and unlike strictness flags, which force an evaluation of a field, this allows direct creation of datatypes with no potential bottom.

As with strictness flags, this is great for library performance, but I don’t think it’s making it into base any time soon (and caring about something this tiny in your day-to-day is an overkill).

Unfortunately you can’t even do the most basic things with Bruts, such as put them in a list. By contrast strict-wrapper works with everything in the existing ecosystem with minimal ceremony.

{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE UnliftedDatatypes #-}
import Data.Kind
import GHC.Exts

type Brut = TYPE (BoxedRep Unlifted)

type Maybe3 :: Type -> Brut
data Maybe3 a
  = Nothing3
  | Just3 a

-- test18.hs:13:7: error: [GHC-83865]
--     • Expected a lifted type, but ‘Maybe3 ()’ is a boxed unlifted type
--     • In the type signature: a :: [Maybe3 ()]
--    |
-- 13 | a :: [Maybe3 ()]
--    |       ^^^^^^^^^
a :: [Maybe3 ()]
a = []

In all fairness you define strict variations of datatypes in your package and use unsafeCoerce, so the “minimal ceremony” argument only doesn’t apply to unlifted datatypes because noone made a library for it yet.

In all fairness …

I disagree that’s a fair assessment.

noone made a library for it yet

@sgraf has made a library for it, data-elevator.

the “minimal ceremony” argument only doesn’t apply to unlifted datatypes …

I didn’t mean “minimal ceremony” in defining them, I meant in using them. You simply can’t use unlifted data types with the rest of the ecosystem as it currently is. You need to wrap and unwrap them at every use site because the ecosystem expects lifted data types everywhere. But that wrapping and unwrapping completely defeats the point of using unliftedness, because it introduces liftedness!

Until there’s "unlifted base" and indeed a whole unlifted ecosystem (or levity-polymorphic base and levity-polymorphic ecosystem) using unlifted types to avoid space leaks wins you nothing over strict-wrapper.

EDIT: By the way, I’m not wedded to “my package”. I am wedded to the idea of eliminating space leaks from Haskell, because they are one of the top reasons that Haskell might fail as an industrial language (behind difficulty onboarding and ecosystem instability). If there’s a better way to “make invalid laziness unrepresentable” I’ll promote that instead. But it turns out that strict-wrapper is the best way I know of so far.

1 Like

Btw., I would not recommend usage of data-elevator in productive code bases.

Due to the bidirectional nature of how coercions work today (unsafe or not), it is too easy for the optimiser to push coercions around. Technically, the implementation of unsafeCoerce prohibits floating out these coercions (so there’s no way to accidentally make code stricter), but they might still be pushed inwards.

Arguably, such careless floating will most likely lead to a Core Lint error due to the kind changing nature of such heterogeneous coercions (we are going from Lifted to Unlifted or vice versa, after all), but I wouldn’t be surprised to see misoptimisation.

That said, I have not been able to come up with unsound examples so far.

The plan is to use a blessed version of unsafeCoerce as soon as one becomes available, for example https://github.com/ghc-proposals/ghc-proposals/pull/530. The main point about data-elevator is its API.

1 Like

Do you mean “bidirectional” in that Coercible a b is equivalent to Coercible b a, or something else?