On the unrolling of recursive functions applied to static arguments ('Inlining of loop breakers')

On the Haskell Wiki, the following can be read

Will GHC ever inline recursive definitions with static arguments?

Sometimes people ask if GHC is smart enough to unroll a recursive definition when given a static argument. For example, if we could define sum using direct recursion:
[…]
A user might expect sum [1,2,3] to be optimised to 6. However, GHC will not inline sum because it is a self-recursive definition and hence a loop-breaker. The compiler is not smart enough to realise that repeatedly inlining sum will terminate.

And then it goes on to describe a ‘trick’ whereby the whole list-argument is lifted to the type-level, which ‘works’ iff all the elements in the list are also static, but also completely changes the way the function is called.


A few days ago another idea came to mind: Just simply applying rewrite rules.
Turns out that this works perfectly! While GHC will never inline loop-breakers on its own, it will happily apply your rewrite rules:

sumExample :: Int
sumExample = simpleSum [1,2,3,4,5]

simpleSum :: [Int] -> Int
{-# NOINLINE simpleSum #-}
simpleSum [] = 0
simpleSum (x : xs) = x + simpleSum xs

{-# RULES 
"simpleSum/base" simpleSum [] = 0
"simpleSum/recurse" forall x xs. simpleSum (x : xs) = x + simpleSum xs
#-}

Or, another example:

findFirstExample :: Int -> Maybe String
findFirstExample x = findFirst [(10, "a"), (20, "b"), (30, "c")] x

findFirst :: Eq k => [(k, v)] -> k -> Maybe v
{-# NOINLINE findFirst #-}
findFirst [] _ = Nothing
findFirst ((k, v) : xs) needle = if k == needle then Just v else findFirst xs needle

{-# RULES 
"findFirst/base" forall x. findFirst [] x = Nothing
"findFirst/recurse" forall k v xs needle. findFirst ((k, v) : xs) needle = if k == needle then Just v else findFirst xs needle
#-}

This results in (-ddump-simpl -dsuppress-all):

sumExample = I# 15#

for sum and

findFirstExample
  = \ x_a24A ->
      case x_a24A of { I# y_i2yB ->
      case y_i2yB of {
        __DEFAULT -> Nothing;
        10# -> lvl_r2zM;
        20# -> lvl1_r2zN;
        30# -> lvl2_r2zO
      }
      }

for findFirst.

After sharing this with a couple of people, @MorrowM pointed out that recently GHC has obtained rewrite rules that turn list literals into calls to build, therefore allowing list literals to take part in list fusion, meaning that as long as you implement something in terms of e.g. foldr or another list fusion participant, there already are rewrite rules like the above in place that do the work for you and you don’t have to add them yourself:

EDIT: GHC has had these rewrite rules for a long while (at least 16 years). Rather, it’s specifically lookup, which findFirst is based on, which only recently started participating in list fusion. Thanks for the errata @MorrowM!

sumExampleFold :: Int
sumExampleFold = foldBasedSum [1,2,3,4,5]

foldBasedSum :: [Int] -> Int
{-# INLINE foldBasedSum #-}
foldBasedSum xs = foldr (+) 0 xs

resp.

findFirstFoldExample :: Int -> Maybe String
findFirstFoldExample x = findFirst [(10, "a"), (20, "b"), (30, "c")] x

findFirstFold :: Eq k => [(k, v)] -> k -> Maybe v
{-# INLINE findFirstFold #-}
findFirstFold xs key = foldr (\(k,v) acc -> if k == key then Just v else acc) Nothing xs 

(The core/STG output is identical.)


So there’s a few lessons:

  • Specifically if you’re dealing with lists, try to implement your recursive functions using foldr/map/filter etc. and they might very well already fully inline on static inputs.
  • For any other recursive function that you expect will regularly be called with static inputs: Just copy the definition of the function as a set of rewrite rules, and GHC will happily inline it for you.
  • (Side note: Could GHC theoretically be extended to automatically create rewrite rules like these for loop breakers? Or would that be unsound?)

I imagine that this technique can be very useful in the definition of combinators of eDSLs, as they are often recursive but usually called with static arguments.

8 Likes

That’s a cool idea! (Who’s going to add it to the wiki?)

A danger is that the loop breaker might not terminate and such rewrite rules would then cause an infinite loop at compile time.

5 Likes

A danger is that the loop breaker might not terminate and such rewrite rules would then cause an infinite loop at compile time.

You’re right. We would need some kind of termination checker to see whether it is safe to apply this to a particular function. Which we can do, as long as we accept some number of false-negatives (we’re not gonna solve the halting problem today :blush:!). Maybe a simple rule like the following already suffices:

  • A function “truly destructs” its input:
    • It pattern matches (not taking guards into account) on the input and destructures it into its components
    • And inside the function definition, these components are not again combined to re-construct the original input.

I think that that would always terminate. There is one catch remaining however: other manually-written rewrite rules might still mess with the “these components are not again combined” invariant, and then an infinite loop can still happen :frowning_face:. Maybe there is a way around that, but I don’t currently see it.

Alternatively, maybe we can add a new INLINE-like pragma (UNROLL?) to do this on a function-by-function basis, and put the burden of ‘is this correct’ on the programmer?

1 Like

I would just like to point out that desugaring lists with build has been in GHC for quite a while (at least 16 years, as far as I can tell, from digging around). What I meant was that lookup participating in fusion is recent (which is relevant to your findFirstFold example, as it reimplements lookup and could be simplified).

That said, this note from the build desugaring code seems to have some lessons learned about how to avoid code bloat from doing this sort of thing.

3 Likes

Someone had already added a brief mention to this post to the Wiki, but I’ve now expanded the Wiki article to include the examples from the first post in full :blush:

2 Likes

While reading this post a couple of other approaches occurred to me.

Least Fix Points

It makes sense that foldr/build would get constant folding, even without rewrite rules, and we also have a path of generalisation here: any functor-fixpoint finite “recursive” data structure can be written and folded down without using actual recursion:

newtype LFP f = LFP{ fold :: forall x. (f x -> x) -> x }

There are no loop breakers to get in the way. You can either use this representation directly or write the equivalent of foldr/build RULES to translate the standard haskell datatype and operations to and from this form, obtaining fusion as a bonus.

For more operations on this and a related GFP data type, see: Almost obnoxious levels of duality in fixed-points. · GitHub

It’s Not a Bug, It’s a Feature!

GHC has had a certain WONTFIX bug for a long time: the simplifier only understands Haskell’s special direct recursion, so if you instead use lambda-calculus-style Y-combinator recursion, the simplifier runs out of ticks trying to “simplify” the recursion indefinitely. This suggests a more general approach, but there are difficulties. I’d like to write a library module:

module InlineFix (inlineFix, Rec) where

newtype Y a = Y { ($$) :: Y a -> a }

inlineFix :: (a -> a) -> a
inlineFix f = y f $$ y f
 where
  y :: (a -> a) -> Y a
  y f = Y \x -> f (x $$ x)

type Rec a = ((a -> a) -> a) -> a

Then define recursion-style-agnostic functions like:

foo :: Rec (...)
foo fix_ = fix_ \self x y -> ...

So at usage sites I can decide to invoke foo fix, foo inlineFix or even foo memoFix. However, GHC will start simplifying too soon: right inside the body of inlineFix itself! There should be a way to delay this simplification to the usage site of the recursive function, but I suspect it will harm my ~clean interface and involve a certain amount of GHC magic. I don’t have time to experiment with it right now, so I hope someone else figures it out!

6 Likes

Okay, I had a bit of a go at it, but it looks like that feature is just a bug after all. It’s far too happy to explode, so I’ve given up on it. Never mind, though—by combining my two approaches, I found a better way!

When it comes down to it, what is fix? It’s f (f (f ... )) all the way down, but you could also think of it as f . f . f ... with an argument lost to infinity—it’s essentially ω as a church numeral! We don’t need all of it though; a proper, finite church numeral will do.

In short, we can write:

newtype Church = Church{ times :: forall x. (x -> x) -> x -> x }

type Rec a = a -> a

inlineFix :: Church -> Rec a -> a
inlineFix m f = (m `times` inline f) (fix f)

Then use it like so:

sumExample :: Int
sumExample = inlineFix _6 simpleSum [1,2,3,4,5]

simpleSum :: Rec ([Int] -> Int)
simpleSum rec [    ] = 0
simpleSum rec (x:xs) = x + rec xs

And all is jolly!

sumExample = I# 15#

Full implementation: Unrolling recursive functions at static arguments using church numerals · GitHub

Edit: As a final note, I’m not quite satisfied with using fix f in the definition of inlineFix; I’d actually like to put some kind of compile-time bottom there, that results in a compile error if it isn’t eliminated as dead code by simplification. Is there a way to do this?

1 Like

Exploits in this direction are nice, but as you noticed GHC’s Inliner is not equipped to handle encodings of general recursion through general recursive types gracefully. (A conscious and IMO sensible decision.)

I’m pretty sure that your inlineFix is lacking an INLINE pragma without which it is unlikely to work with bigger functions simpleSum. Note that even if inlineFix gets inlined, GHC will inline the simplified RHS in which inline has already been inlined, to no effect at all. Furthermore, it is crucial that _6 inlines first so that the 6 occurrences of inline f will be visible before inline is inlined (which is pretty early). Overall I’m not that optimistic of this approach…

On the other hand, you can achieve a statically known number of unrollings pretty easily by leaning on type class specialisation. Inlining and Specialisation - HaskellWiki lists 3 approaches to achieve unrolling, among which (1) is the technique described in the OP. Edit: Ah, it appears that the wiki page was edited in response to this thread. Well done!

Also have a look at the user’s guide entry for SPECIALISE INLINE.

3 Likes

The snippets in the post are cut down from the implementation in the linked gist; inlineFix does have an INLINE pragma. It’s my understanding that bindings marked INLINE or INLINEABLE will be inlined in their unsimplified form and thus retain inline.

Looking at the source, we have {-# NOINLINE[0] inline #-}. Again, if I’ve understood correctly, that means “don’t inline inline until the last phase of simplification”. If all the combinators for constructing church numerals are marked INLINE, they should inline first. That said, I don’t quite follow you on why this would be necessary.

1 Like

That’s probably alright then.

At any rate, you would know why it’s necessary once you start instantiating inlineFix with huge arguments f, where GHC is hesistant to simply inline it. simpleSum is too small in that regard.

1 Like

@Leary Thanks for sharing the Church-numeral encoding idea! Very clever :smiley:!
Nice that it works without rewrite rules.
Of course, it’s a little bit sad that you’ll have to manually indicate how many iterations of inlining you want to occur (too little, and it won’t fully inline. (Far) too much and the inliner will give up inlining the f . f . f . f ...). I guess we probably can’t have our cake and eat it too in that regard :frowning:

1 Like