Can I coax GHC to unbox `Maybe (Int, Int)` to `(# (# #) | (# Int#, Int# #) #)` in not so many words?

Consider the naïve “list compression” function implemented as

compress_naive :: forall a. Eq a => [a] -> [(a, Int)]
compress_naive = \case
    []      -> []
    a : sa' ->
        let (sa'l, sa'r) = span (a ==) sa'
        in  (a, 1 + length sa'l) : compress_naive sa'r

The above gets the point across mathematically, but if we care about performance then we should leverage the fact that compress streams over its input. In particular, we might rewrite it fold/build-fusibly as

{-# INLINE compress #-}
compress :: forall a. Eq a => [a] -> [(a, Int)]
compress = \ sa -> GHC.List.build $ \ g b ->
    foldr
      ( \ a' k -> \case
            Just (a, !n) -> case a == a' of
                True  -> k $ Just (a, n + 1)
                False -> g (a, n + 1) . k $ Just (a', 0)
            Nothing      -> k $ Just (a', 0) )
      ( \case
            Just (a, !n) -> g (a, n + 1) b
            Nothing      -> b )
      ( sa )
      ( Nothing )

with the “state of the loop” (i.e., of the left part of the folding of sa, achieved via the continuation trick) represented as a value of type Maybe (Int, Int) in which each iteration of said loop is fully (i.e., recursively) strict and wherein the Nothing constructor represents the absence of a value already consumed from the input against which to compare for equality.

(If this sucks/is otherwise wrong in some obvious way other than legibility, please let me know!)

Okay, let’s see how well GHC can optimize this in a reasonable context. Consider the “Fibonacci mod 3” sequence on a given pair of initial values defined as

{-# INLINE fibMod3 #-}
fibMod3 :: Int -> Int -> [Int]
fibMod3 = \ a0 a1 ->
    fmap fst $ iterate' (\ (!a0', !a1') -> (a1', flip rem 3 $ a0' + a1')) (a0 `mod` 3, a1 `mod` 3)

and in turn let the function we wish to optimize be

compressFib :: Int -> Int -> Int -> Int
compressFib = \ a0 a1 n ->
    sum . fmap (\ (a, i) -> a `div` i) . compress . filter (0 <) . take n $ fibMod3 a0 a1

The details are mostly arbitrary; the point is that on the one hand GHC (which is apparently otherwise quite clever at finding ways to do so) cannot easily prepare in advance for whether compress will be passed an empty input (when it can—for instance in the variation on the above with take n pushed left of compress—it tends to optimize the above exactly as I would have hoped, being able to fully eliminate the Maybe) and yet on the other it is intuitively clear how to rewrite the above to a tight, unboxed loop.

Unfortunately, when I compile the above with -fforce-recomp -O2 -fexpose-all-unfoldings -fspecialize-aggressively -funbox-strict-fields (GHC version 9.12.2), the “inner loop” is a mess spread across several not-fully-unboxed functions (ignoring wrappers):

$wn :: Maybe (Int, Int) -> Int# -> Int#

compressFib1 :: Int -> (Maybe (Int, Int) -> Int -> Int) -> Maybe (Int, Int) -> Int -> Int

compressFib_$s$wgo :: Int# -> Int# -> Int# -> Maybe (Int, Int) -> Int -> Int

But a (fully, i.e., recursively) strictly-consumed argument of type Maybe (Int, Int) can be unboxed to one of type (# (# #) | (# Int#, Int# #) #) and then represented in turn as three Int#s.

So what gives?

3 Likes

I haven’t looked through all the details in your post, but GHC doesn’t unbox product types by default.

If a function is applied to a fixed constructor at a call site SpecConstr can often achieve the same effect, but it’s not exactly the same as unboxing.

If a product type is used inside another constructor you can force unpacking using the pragma. Sadly this approach is not available for function arguments.

So I assume the maybes you are seeing get unboxed are those specialised by SpecConstr. I’ve used pattern synonyms over unboxed sums in the past with good success to ensure unboxing like this, while ensuring it’s not too painful to use.

Maybe you can use this as inspiration where I defined a unboxed Maybe as a unboxed sum. There are probably libraries doing the same thing but better outside.

3 Likes

Thanks for taking a look! I did try manually unboxing the “state” of the loop (really, I should have tried this earlier). Specifically, the initial module I compiled was

{-# LANGUAGE Haskell2010
  , BangPatterns
  , LambdaCase
  , ScopedTypeVariables
#-}

{-# OPTIONS_GHC
    -Wall
    -ddump-to-file
    -ddump-simpl
    -dno-typeable-binds
    -dsuppress-coercions
    -dsuppress-type-applications
    -dsuppress-module-prefixes
#-}

module CompressFib
  ( compressFib
  ) where

import Data.List
  ( iterate' )

import qualified GHC.List
  ( build )

{-# INLINE compress #-}
compress :: forall a. Eq a => [a] -> [(a, Int)]
compress = \ sa -> GHC.List.build $ \ g b ->
    foldr
      ( \ a' k -> \case
            Just (a, !n) -> case a == a' of
                True  -> k $ Just (a, n + 1)
                False -> g (a, n + 1) . k $ Just (a', 0)
            Nothing      -> k $ Just (a', 0) )
      ( \case
            Just (a, !n) -> g (a, n + 1) b
            Nothing      -> b )
      ( sa )
      ( Nothing )
 
{-# INLINE fibMod3 #-}
fibMod3 :: Int -> Int -> [Int]
fibMod3 = \ a0 a1 ->
    fmap fst $ iterate' (\ (!a0', !a1') -> (a1', flip rem 3 $ a0' + a1')) (a0 `mod` 3, a1 `mod` 3)

compressFib :: Int -> Int -> Int -> Int
compressFib = \ !a0 !a1 n ->
    sum . fmap (\ (a, i) -> a `div` i) . compress . filter (0 <) . take n $ fibMod3 a0 a1

(compiling with ghc -fforce-recomp CompressFib -O2 -fexpose-all-unfoldings -fspecialize-aggressively -funbox-strict-fields). This had the problems in the original post.

I then tried rewriting compress as suggested (turning on -XMagicHash and -XUnboxedTuples and importing pattern I# and (+#) from GHC.Exts) as

{-# INLINE compress #-}
compress :: forall a.
    Eq a =>
    [a] -> [(a, Int)]
compress = \ sa -> GHC.List.build $ \ g b ->
    foldr
      ( \ !a' k -> \case
            (# | (# !a, n #) #) -> case a == a' of
                True  -> k $ (# | (# a, n +# 1# #) #)
                False -> g (a, I# $ n +# 1#) $ k $ (# | (# a', 0# #) #)
            (# (# #) | #)       -> k $ (# | (# a', 0# #) #) )
      ( \case
            (# | (# !a, n #) #) -> g (a, I# $ n +# 1#) b
            (# (# #) | #)      -> b )
      ( sa )
      ( (# (# #) | #) )

but this still did not eliminate the boxing of the Int result type of the loop left after fold/build-fusion.

Then I tried specializing compress to Int so that I could replace the Maybe type with a tuple (specializing so that the Just half of the tuple has a default value that can be strictly evaluated) and manually unpacking the tuple as arguments of the loop, resulting in

{-# INLINE compress #-}
compress :: [Int] -> [(Int, Int)]
compress = \ sa -> GHC.List.build $ \ g b ->
    foldr
      ( \ !a' k -> \case
            0 -> \ !_ !_ -> k 1 a' 0
            _ -> \ a  !n -> case a == a' of
                True  -> k 1 a (n + 1)
                False -> g (a, n + 1) $ k 1 a' 0 )
      ( \case
            0 -> \ !_ !_ -> b
            _ -> \ !a !n -> g (a, n + 1) b )
      ( sa )
      ( 0 :: Int ) ( 0 ) ( 0 )

but the resulting core is still not fully unboxed! Instead I have some mess of functions like

$wn :: Int# -> Int# -> Int# -> Int# -> Int# -- This one is fine...

compressFib_n :: Int -> Int -> Int -> Int -> Int

$wlvl :: Int -> (Int -> Int -> Int -> Int -> Int) -> Int# -> Int -> Int# -> Int -> Int

compressFib_$s$wgo :: Int# -> Int# -> Int# -> Int -> Int -> Int -> Int -> Int

$wcompressFib :: Int# -> Int# -> Int# -> Int

And I admit that I have not inspected this final core output especially carefully, because this whole thing was just meant to be a toy example to figure out what was going wrong. (On that note, it surely can’t be a lack of strictness annotation?)

I guess the broader question here is that it’s often convenient to write general functions that (ideally reliably) fold/build-fuse to loops whose arguments and result GHC unboxes (as long as strictness analysis says that this is okay). Is there any way to ensure that this happens? I’m not even quite sure how I would do so in this case in light of the above…