Unlifted types vs strict-wrapper

Continuing the discussion from Levity-Polymorphic Type Classes by Icelandjack · Pull Request #30 · ghc-proposals/ghc-proposals · GitHub

A recap of my position:

Personally, I’d like to see a future where you can program almost normally as you can today, but using unlifted data types exclusively. Thus basically carving out a strict subset of Haskell. That’s why I’d like to see the classes that we have today generalized to support unlifted types. A bit further in the future, I’d like to see a way to combine both lifted and unlifted programs in an ergonomic way. That would require easy conversion functions between lifted and unlifted versions of data types like maybe, tuples, and lists.

For me it is […] important to have the guarantee that my values don’t contain any surprise computations which may make my program take a lot more space and time than I expect. Instead of wondering if I should use a bang pattern at every binding (and then you still sometimes have to use $! for some function applications), with unlifted data types you only have to make that decision once per data type.


@tomjaguarpaw replied:

If there are examples of this issue that are solved by unlifted types but not by Make invalid laziness unrepresentable then I would like to know about them!

My response was that the strict-wrapper approach only works if the values are in some kind of data structure, but if you just use the value in local let bindings and as function arguments then it does not help. As examples I mentioned that it can be difficult to know whether to use a strict or lazy variant of many functions like foldr/foldr' and iterate/iterate'.

If I’m understanding @tomjaguarpaw’s arguments correctly, then he believes each such function should have one blessed variant and the other should never be used. For the right fold the blessed version is the lazy version and for iterate the blessed version is the strict iterate' function.

I don’t agree with this and tried to show a useful application of a strict right fold:

type List :: UnliftedType -> UnliftedType
data List a = Nil | Cons a (List a)

foldrList' :: forall (a :: UnliftedType) (b :: UnliftedType). (a -> b -> b) -> b -> List a -> b
foldrList' _ z Nil = z
foldrList' k z (Cons x xs) = k x (foldrList' k z xs)

map f xs = foldrList' (\x xs' -> Cons (f x) xs') Nil xs

And here I’ll add an example where the lazy iterate function is more performant than the strict variant:

foo i = head (iterate ('a' :) [] !! i)

The strict version would be exponential space (in the number of bits in the input i; ignoring that Int has finite precision), while this one is constant space. Edit: actually,that’s not true.

Of course this is not a practical example at all, but I think there are practical examples that similarly use laziness.


In @tomjaguarpaw’s latest comment he also wrote this:

Hmm, we may be talking at cross purposes. I’m trying to explain why it never makes sense to write

foldr' _ z []     = z
foldr' f z (x:xs) = f x $! foldr' f z xs

If f is strict in its second argument then the $! is redundant and if it’s lazy then you’ve leaked stack space. Am I missing something here?

The compiler must assume that f is lazy to preserve semantics. If you leave out the $! then it will be forced to allocate a thunk even if f turns out to be strict. So if you only plan to use this function with strict f then you should use $! to avoid allocating those thunks.

2 Likes

…you mean something like Sixten?


A bit further in the future, I’d like to see a way [in Haskell] to combine both lifted and unlifted programs in an ergonomic way.

…you mean something like Clean?

o_0 …I wasn’t expecting that:

https://clean.cs.ru.nl/Documentation

…still lists 2.2 as being the most-recently documented version of Clean even though 3.1 is the current version - I’ll assume (hazardously) there’s been only trivial changes.

Glancing through:

http://clean.cs.ru.nl/download/doc/CleanLangRep.2.2.pdf

…using unboxed as the search term, I can see that my recollection was “vague” - Clean seems to only have has strict and unboxed versions of lists and arrays only. So Clean merely provides a “small starting-point” for what you’re contemplating here: some of what it does could be useful, but more work would be needed to expanded for general-purpose use.

1 Like

I believe Sixten is more about unboxing everything as much as possible. I don’t want to go quite that far. I would already be satisfied if we can get to an ML-like langauge (in terms of strictness).

I believe Clean is very similar to Haskell with bang patterns and strict fields on data types. The problem is that those annotations only consider head normal form (or I guess Clean calls it root normal form) and not a full normal form of nested structures. But I don’t know that much about Clean, so I might have gotten the wrong impression from a quick glance at the documentation.

And compared to unlifted types, the strictness annotations of clean have the same problem they they only apply to the components of a data structures and not on the data structure itself. So then you’d still need to add extra annotations at all the use sites and that may be impossible for polymorphic functions.

That’s a plausible goal. But it’s hard to makes sense of this discussion without a bit more context. You could say “go read the discussion at #30” but it is long and discursive. It would be great if you felt able to lay it all out in a single post, so that we can see the issues, why Haskell today doesn’t allow what you want (despite supporting unlifted types), and why -XStrict doesn’t do the job.

1 Like

To be clear, my responses on the thread were motivated almost entirely by this sentence:

For me it is much more important to have the guarantee that my values don’t contain any surprise computations which may make my program take a lot more space and time than I expect.

That is to say, I have not yet seen anything that makes me think that unlifted types bring anything to the problem of avoiding “surprise computations” that is not already achieved by Make invalid laziness unrepresentable. On the other hand, I do acknowledge that unlifted types are an interesting research area and one of their benefits may be improved performance.


Now on to some responses to particulars. I’m not sure how well they fit into the overall flow of the discussion, but since they’re on my mind I may as well write them down.

he believes each such function should have one blessed variant and the other should never be used

To first order approximation, yes. I believe the choice between strict and lazy is too taxing to expect everyday Haskell programmers to make it. I’ve been programming Haskell professionally for over 10 years, addressing space leaks a major interest of mine, and even I don’t want to have to make that choice. In general there is one correct default choice (foldl', modifyIORef', strict State monad, State monad modify', …, Data.Map.Strict) that is optimal for most cases and acceptable for all realistic cases. The specialized variants have a place, because they can be useful, e.g. better-performing, in very rare cases, but they should be tucked away somewhere safely out of sight of all non-expert Haskell users.

I don’t agree with this and tried to show a useful application of a strict right fold:

But that’s not a strict right fold. That’s just the right fold on that particular strict list type. There is no corresponding lazy fold.

I’ll add an example where the lazy iterate function is more performant than the strict variant

When you find an example you’re happy with please post it separately and I’ll respond to it.

The compiler must assume that f is lazy to preserve semantics. If you leave out the $! then it will be forced to allocate a thunk even if f turns out to be strict. So if you only plan to use this function with strict f then you should use $! to avoid allocating those thunks.

This seems like a reasonable argument. However, I dismissed it not on the grounds of whether it has any performance benefits at all, but on whether it helps avoid “surprise computations” in a way that Make invalid laziness unrepresentable can’t. I don’t think it does.

To address it on its own merits, yes, if the compiler can’t produce two specialized/worker-wrappered foldr's, one each for f lazy/strict in its second argument, then yes, this definition of a strict foldr has a place. But its benefit is marginal and it should be squirrelled away somewhere for experts, out of the eyes of most users (and I count myself amongst the “most users”).

1 Like

if the compiler can’t produce two specialized/worker-wrappered foldr's, one each for f lazy/strict in its second argument

So you would want foldr to behave like foldr' if the passed accumulator is unlifted? The compiler inferring levity would be disastrous in this case, as without a type signature GHC would pick one of two completely different behaviors.

No, not unlifted. I’m talking about when f is strict in the second argument (and everything is lifted, i.e applies to Haskell98).

Then you just have two different functions that do different things, I don’t know what this has to do with unlifted datatypes as a tool.

Reasoning about unlifted values directly is to me evidently better than reasoning about constantly forcing lifted ones: I shouldn’t have to bring up boxes in a code section that is explicitly boxless. For people who write recursive data structures this means both no chagrin of constant unboxing and a nice performance boost on as a bonus.

I’m afraid I don’t know either. What I’m looking for is examples of where unlifted data types help resolve space leaks in a way that Make invalid laziness unrepresentable doesn’t. I haven’t seen an example that convinces me yet.

Sorry if this is at odds with what you thought the conversation was about. Perhaps we’re talking about different things.

To reiterate on my side of the discussion:

I’ve spent quite some time writing a radix tree using UnliftedDatatypes and it’s a refreshing experience to say the least, however there’s room for improvements:

  • Functions from GHC.Exts could really use some kind of type class machinery. Not as free as the regular type classes, since I don’t think carrying dictionaries into runtime makes sense here, but the basic idea remains;

  • Perhaps an if# (_ :: Int#) then# _ else# _ is warranted because case _ of 0# -> _; _ -> _ is a bit confusing to navigate (I found myself expecting True as the top option);

  • Functions from GHC.Exts could also be spread out into meaningful modules as it looks more like a landfill than a properly maintained feature at the moment;

  • The need for proper levity polymorphism seems to mostly center around defining unlifted datatypes in an ergonomic way, everything else is gravy. If implemented it most probably removes the need for the previous three points; if not, and if for now UnliftedDatatypes is indeed the go-to for writing strict recursive datatypes, then the extra magic sugar could help library authors quite a bit.

3 Likes

Now that I think about it I might be overvaluing GHC.Exts for intermediate operations and I could instead just use regular Haskell, unboxing results in place. It does still feel like a more direct way of doing things, so perhaps the very fact that I treat the two sides of the language as different places biases me towards wanting to see more ergonomics for the low-level approach.

I wholeheartedly agree with this.

Does that mean you want non-expert users to write Strict ... around all their types? Why not introduce a custom prelude which makes all data types strict by default? And if we do that, why not make all those types unlifted and get extra performance for free? That would be the first step of my proposal: carving out a strict sublanguage.

For contrast, here’s the lazy version (to be explicit, I mean lazy in the result):

foldrList :: forall (a :: UnliftedType) b. (a -> b -> b) -> b -> List a -> b
foldrList _ z Nil = z
foldrList k z (Cons x xs) = k x (foldrList k z xs)

This can be used for example to define a short-circuiting and operation on a list of booleans. The strict variant can’t short circuit.

I think there are two advantages to unlifted types compared to strict-wrapper:

  • You get an error if you forget to use the strict variant of functions like foldl'. This can be mitigated by using an alternative prelude which only exposes the “blessed” variant of each function (or by having the discipline to avoid the wrong versions in the current Prelude). But this point gets more important if you do actually want to start mixing lazy and strict data in the same program.
  • You get better performance, especially when using polymorphic functions. This is due to thunks still popping up in unexpected places, but these thunks are not likely to be very long lived.

As I mentioned not here here, could a primitive generic hyperstrict function:

primitive compel :: a -> a  -- a partial anagram of "complete"

…be useful in alleviating most (perhaps all) of the need to use Strict or unlifted types for “enhanced performance” ?

Such a function would have to traverse the whole value to check if it is evaluated and that can be slow. It is usually better to prevent the creation of thunks in the first place.

(By the way I can’t find what you’re referencing on that wiki page)

But I should add that I do basically want some form of this to convert from a lifted value to an unlifted value, if that is really necessary.

Well, there’s another solution to the “thunk-building” problem:

No, not all, just the ones from base that are lazy for historical reasons. Non-expert users should define their own types to be strict and for those types strict-wrapper won’t be needed.

Why not introduce a custom prelude which makes all data types strict by default?

There are libraries that do take a step in the “custom strict prelude” direction, such as strict, but I doubt they will take off because of the friction in using them with the rest of the ecosystem. That’s exactly what strict-wrapper is trying to avoid. It aims at interoperability with the existing ecosystem.

For contrast, here’s the lazy version (to be explicit, I mean lazy in the result)

Oh right … I think I was just wrong before. The strictness of the accumulator is unrelated to the strictness of the list spine or the strictness of its values. I suppose I was just confused. I still don’t think a strict right fold is a particularly compelling example of what I want to see though.

You get an error if you forget to use the strict variant of functions like foldl'.

Hmm, sort of. As long as you have defined your accumulator to be unlifted, But if you accidentally define your accumulator to be lifted and accidentally use foldl your two mistakes match up perfectly and the type checker doesn’t help you.

You get better performance

Yes, this seems plausible.

1 Like

I think we’re reaching an understanding. I agree strict-wrapper still has a big advantage due to easy integration with the existing ecosystem and that it is already a worked-out approach.

The main weakness that I can see now is that I don’t think it is obvious for everybody which functions to use and which to avoid and people can still make mistakes, for example this recent case:

I guess Stan could be used to give warnings about this, or does it already do that?


Unlifted types are still very much a work in progress and it is not clear if it can make true on all the potential that I see in it. And even if it does, there is still the question if it causes too much friction with the rest of the ecosystem. But it still seems worth investigating.

Yes, that’s very annoying.

Stan could be used to give warnings about this, or does it already do that?

I don’t know! It’s worth checking.

it still seems worth investigating

Definitely

1 Like