A question on constructor sharing and evaluation

The Haskell Wiki states that small enumerations reuse memory for all constructors.

Let’s say we have the following:

data S = L | R

otherB :: S -> S
otherB L = R
otherB R = L

otherU :: S -> (# S #)
otherU L = (# R #)
otherU R = (# L #)

From what I gather (and dumping STG confirms this), otherU does not attempt to evaluate the resulting S, while otherB does.

Does GHC skip such evaluation operations at a later stage or are unboxed tuples actually required to get the desired behavior?

Or I guess the reason why that can’t be done is simply the fact that S has a ⊥ as a possible state and GHC won’t ever bother proving the function never returns it.

Which would mean the answer is either an unboxed one-tuple or declaring S explicitly unlifted.

What is the desired behavior? The constructors R and L are already values, so they don’t need to be evaluated.

Dumping STG shows me both functions are pretty much the same except for the extra unboxed tuple (Solo#):

otherB =
    \r [ds_skV]
        case ds_skV of {
          L -> R [];
          R -> L [];
        };

otherU =
    \r [ds_skX]
        case ds_skX of {
          L -> Solo# [R];
          R -> Solo# [L];
        };

The only difference I can see is that otherB uses R [] and otherU uses just R, but I don’t think that is significant.

They’re not the same once you apply them uninlined, since you get roughly

case Main.otherB Main.L of x [Occ=Once1!] {
    __DEFAULT -> ...
  }

and

case Main.otherU Main.L of
  Solo# x [Occ=Once1] -> ...

The desired behavior is doing as few evaluations as needed.

For an example think of a dictionary that maps values to unlifted constructors of S and some lookupR :: Tree -> S that is strictly used as an intermediate step in a function. Invoking let !s = lookupR t not only runs lookupR, it also evaluates the s, which shouldn’t be necessary in this particular case.

The unboxed tuples approach should be as performant as the default one if GHC is smart enough, so the question is really just “do I need to bother with the tuples or am I wasting my time?”.

Can you give a full example then? I still cannot reproduce it with this:

{-# LANGUAGE UnboxedTuples #-}
module SLR where

data S = L | R

{-# NOINLINE otherB #-}
otherB :: S -> S
otherB L = R
otherB R = L

{-# NOINLINE otherU #-}
otherU :: S -> (# S #)
otherU L = (# R #)
otherU R = (# L #)

mainB :: S
mainB = case otherB L of
  L -> R
  R -> L

mainU :: S
mainU = case otherU L of
  (# L #) -> R
  (# R #) -> L

That seems to give “good” STG:

otherB =
    \r [ds_slE]
        case ds_slE of {
          L -> R [];
          R -> L [];
        };

L = L! [];

mainB =
    \u []
        case otherB L of {
          L -> R [];
          R -> L [];
        };

R = R! [];

otherU =
    \r [ds_slH]
        case ds_slH of {
          L -> Solo# [R];
          R -> Solo# [L];
        };

mainU =
    \u []
        case otherU L of {
        Solo# ipv_slK ->
        case ipv_slK<TagProper> of {
          L -> R [];
          R -> L [];
        };
        };

(The <TagProper> means that the second case is not actually doing an evaluation.)

I tend to pack stuff into IORefs to kill GHC optimizations on that front.

main :: IO ()
main = do
  a <- newIORef L
  x <- readIORef a
  let !x' = otherB x
  writeIORef a x'

  b <- newIORef L
  y <- readIORef b
  let !(# y' #) = otherU y
  writeIORef b y'

That still gives me good STG with one eval for each:

        case otherB ipv3_sLJ of x'_sLK {
        __DEFAULT ->
        case writeMutVar# [ipv1_sLG x'_sLK void#] of s2#_sLL {

And

        case otherU ipv7_sLR of {
        Solo# ipv8_sLT ->
        case writeMutVar# [ipv5_sLO ipv8_sLT void#] of s2#1_sLU {

In fact, your code doesn’t strictly demand the y' that’s inside the unboxed tuple, you should probably use:

    let !(# !y' #) = otherU y

Then you do get two cases, but one is properly tagged so it doesn’t eval:

        case otherU ipv7_sLP of {
        Solo# ipv8_sLR ->
        case ipv8_sLR<TagProper> of y'_sLS {
        __DEFAULT ->
        case writeMutVar# [ipv5_sLM y'_sLS void#] of s2#1_sLT {

That is precisely the point of the question, the y' at this point in the application should never be a box because these functions only return an L or R, i.e. a pointer to a shared constructor value.

Whether a variable is demanded is a static property and whether a value is a thunk (all pointers are boxes, some are thunks) is a dynamic property.

Statically, GHC will not try to evaluate the result of otherU unless you explicitly demand it, but dynamically it will not be a thunk because the otherU function does not produce thunks. (And GHC does not introduce a new thunk to store the result of otherU because you do demand the unboxed tuple.)

Lifted values only cause redundant work when (1) thunks are created to store them in, or (2) they are demanded and thus evaluated (which means that it checks whether the value is a thunk and if so runs it). In your program the otherU function does not create thunks, so (1) is not an issue. It seems you are mostly worried about (2), but that cannot cause an issue if you never demand the value.

If you do demand the y' then (2) might be an issue, but I’m showing that it produces a case match with a <TagProper> annotation, which is GHC’s way of telling you that it knows that the value is not a thunk and thus no evaluation needs to happen.