Optimizing Unit bottom

Would it be wrong for a (hypothetical) compiler to optimize

f :: ()
f = f

to ()?

Yes technically the user asked for _|_, but can _|_ ever be ‘better’ than ()?

Why does this matter?

Because as far as I understand, the compiler has to store a boxed representation for (), just to account for _|_. If this optimization was legal, () could be stored in zero space, allowing considerable optimization and code reuse (Set e could be implemented as Map e (), for example, with no loss of efficiency).

Is GHC’s representation of () discussed anywhere?

Formally this is known as η-conversion for the unit type. It would be troublesome in the presence of seq. For example, deepseq defines

class NFData a where
  rnf :: a -> ()

where evaluating rnf blah should ensure blah is fully evaluated to normal form, then return (). That’s sometimes useful to avoid space leaks. But it would become useless if rnf blah was silently η-converted to ().

Because as far as I understand, the compiler has to store a boxed representation for (), just to account for _|_. If this optimization was legal, () could be stored in zero space, allowing considerable optimization and code reuse (Set e could be implemented as Map e (), for example, with no loss of efficiency).

With -XUnboxedTuples, GHC does actually have an unboxed unit type, namely (# #). However its kind is not Type, so you can’t store it in a Map. This is an instance of the general property that unboxed things can’t be stored in polymorphic datatypes (because the polymorphic code has to know the runtime representation of the data, i.e. it has to be boxed). One could imagine a compiler that did things differently, but it would be very different to GHC.

You might find the GHC user’s guide on this topic helpful: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/exts/primitives.html

2 Likes

Couldn’t we all just switch to deepseq :: NFData a => a -> b -> b? Is that really a big problem?

Perhaps. The bigger problem is that if you give () an unboxed representation, it can no longer be used to instantiate polymorphic functions or data structures, so the Map e () example won’t fly.

1 Like

There are some data types which asymptotic complexity depends on evaluation to weak head normal form, using deepseq would destroy that. There are many examples in the classic Okasaki book “Purely functional data structures”.

I meant to use deepseq instead of rnf, which already evaluates its argument to (not weak head) normal form.

What would you expect deepseq (undefined :: Int) () to do?

With the optimization proposed in this thread I would expect it to return (). If you want to be sure to evaluate the first argument then you should deepseq it to something that really has multiple non-undefined inhabitants. I think optimizing for the case where the program is not expected to crash or loop indefinitely is pretty reasonable. Even if it means that the semantics of crashing and/or looping programs changes.

But yeah, polymorphism will still be a problem. And maybe the new unlifted data types could also be a better solution to this problem.

1 Like