Optimizing Unit bottom

AriFordsham · April 22, 2021, 11:39am

Would it be wrong for a (hypothetical) compiler to optimize

f :: ()
f = f

to ()?

Yes technically the user asked for _|_, but can _|_ ever be ‘better’ than ()?

Why does this matter?

Because as far as I understand, the compiler has to store a boxed representation for (), just to account for _|_. If this optimization was legal, () could be stored in zero space, allowing considerable optimization and code reuse (Set e could be implemented as Map e (), for example, with no loss of efficiency).

AriFordsham · April 22, 2021, 11:40am

Is GHC’s representation of () discussed anywhere?

adamgundry · April 22, 2021, 1:54pm

Formally this is known as η-conversion for the unit type. It would be troublesome in the presence of seq. For example, deepseq defines

class NFData a where
  rnf :: a -> ()

where evaluating rnf blah should ensure blah is fully evaluated to normal form, then return (). That’s sometimes useful to avoid space leaks. But it would become useless if rnf blah was silently η-converted to ().

Because as far as I understand, the compiler has to store a boxed representation for (), just to account for _|_. If this optimization was legal, () could be stored in zero space, allowing considerable optimization and code reuse (Set e could be implemented as Map e (), for example, with no loss of efficiency).

With -XUnboxedTuples, GHC does actually have an unboxed unit type, namely (# #). However its kind is not Type, so you can’t store it in a Map. This is an instance of the general property that unboxed things can’t be stored in polymorphic datatypes (because the polymorphic code has to know the runtime representation of the data, i.e. it has to be boxed). One could imagine a compiler that did things differently, but it would be very different to GHC.

You might find the GHC user’s guide on this topic helpful: 6.16. Unboxed types and primitive operations — Glasgow Haskell Compiler 9.12.1 User's Guide

jaror · April 22, 2021, 2:20pm

Couldn’t we all just switch to deepseq :: NFData a => a -> b -> b? Is that really a big problem?

adamgundry · April 22, 2021, 6:14pm

Perhaps. The bigger problem is that if you give () an unboxed representation, it can no longer be used to instantiate polymorphic functions or data structures, so the Map e () example won’t fly.

coot · April 24, 2021, 4:57pm

There are some data types which asymptotic complexity depends on evaluation to weak head normal form, using deepseq would destroy that. There are many examples in the classic Okasaki book “Purely functional data structures”.

jaror · April 24, 2021, 8:58pm

I meant to use deepseq instead of rnf, which already evaluates its argument to (not weak head) normal form.

tomjaguarpaw · April 25, 2021, 8:00am

What would you expect deepseq (undefined :: Int) () to do?

jaror · April 25, 2021, 10:06am

With the optimization proposed in this thread I would expect it to return (). If you want to be sure to evaluate the first argument then you should deepseq it to something that really has multiple non-undefined inhabitants. I think optimizing for the case where the program is not expected to crash or loop indefinitely is pretty reasonable. Even if it means that the semantics of crashing and/or looping programs changes.

But yeah, polymorphism will still be a problem. And maybe the new unlifted data types could also be a better solution to this problem.

Topic		Replies	Views
R/haskell - Was simplified subsumption worth it for industry Haskell Learn	26	3254	May 11, 2022
Transparently implement data types differently and more compact	11	996	October 28, 2021
Deep Subsumption Proposal Links	1	1030	June 24, 2022
Deepseq versus "make invalid laziness unrepresentable"	30	2238	March 29, 2024
A question on constructor sharing and evaluation	8	435	March 21, 2024

Optimizing Unit bottom

Related topics