When call-by-name outperforms call-by-need: space leaks with `cycle` etc

I have been teaching evaluation stategies for many years, advertising call-by-need as the Synthese of call-by-value and call-by-name: Call-by-value (These) might evaluate unused function arguments, call-by-name (Antithese) does not do such but might evaluate arguments several times, yet call-by-need evaluates each argument at most once, and only when needed. So it takes the best of two worlds, or does it?

I knew from programming practice that this is a gross oversimplification, and call-by-need complicates the semantics by making a heap indispensable, and there are all sorts of difficulties with space leaks through thunk retainment etc.

However, when studying space leaks in earnest for the first time, I was still surprised to come across a simple example where call-by-need is outperformed not by call-by-value but by the unlikely call-by-name. In this example, the “at most once” property of argument evaluation comes not as a blessing, but as a bane. The example is a Haskell one-liner:

main = print $ head $ drop (10^8) $ cycle [1..10^7]

Here, [1..10^7] is syntax for the generator enumFromTo 1 (10^7) that lazily produces the list [1,2,...,10^7].

This program swallows huge amounts of space, it constructs the whole list in memory.
Why is that?

Read on at When call-by-name outperforms call-by-need | when-call-by-name-outperforms-call-by-need

9 Likes

I am kind of surprised your solution works. GHC tends to float out static things like that enumeration to the top level and then it still would allocate the whole thing. You’d have to disable full laziness to avoid that. Specifically, this program still allocates the whole list:

cycle :: (() -> [a]) -> [a]
cycle f = f () ++ cycle f

xs = [1..10^7]
main = print $ head $ drop (10^8) $ cycle $ \ () -> xs
If you want to prevent this from happening you can define a type of by-name streams like this:
data Stream a = forall s. MkStream !s (s -> Step s a)
data Step s a = Done | Yield a !s | Skip !s

enumFromToS :: Int -> Int -> Stream Int
enumFromToS a b = MkStream a step where
  step x | x > b = Done
         | otherwise = Yield x (x + 1)

cycleS :: Stream a -> Stream a
cycleS (MkStream s0 step) = MkStream s0 step' where
  step' s = case step s of
    Done -> Skip s0
    Yield x s' -> Yield x s'
    Skip s' -> Skip s'

data DropState a = DS !Int !a

dropS :: Int -> Stream a -> Stream a
dropS n (MkStream s0 step) = MkStream (DS n s0) step' where
  step' (DS 0 s) = case step s of
    Done -> Done
    Yield x s' -> Yield x (DS 0 s')
    Skip s' -> Skip (DS 0 s')
  step' (DS n s) = case step s of
    Done -> Done
    Yield x s' -> Skip (DS (n - 1) s')
    Skip s' -> Skip (DS n s')

headS :: Stream a -> Maybe a
headS (MkStream s0 step) = go s0 where
  go s = case step s of
    Done -> Nothing
    Yield x _ -> Just x
    Skip s' -> go s'

main = print $ headS $ dropS (10^8) $ cycleS $ enumFromToS 1 (10^7) 

If you compile this with -O2 you get constant space and a minimal amount of total allocations. With -O1 you still get a lot of short-lived allocations.

Also, it should be noted that the by need version would be faster if computing the list elements was slower than allocating memory. That’s why GHC generally prefers memory allocation over arbitrary computation. For example, adding a little Fibonacci inspired computation in there makes by-need faster than by-name:

drop' :: Int -> [a] -> [a]
drop' 0 x = x
drop' _ [] = []
drop' n (!x:xs) = drop' (n - 1) xs

fib :: Int -> Int -> Int
fib c 0 = c
fib c 1 = c + 1
fib c n = fib c (n - 1) + fib c (n - 2)

main = print $ head $ drop' (10^8) $ cycle $ map (\c -> fib c 5) [1..10^7]
2 Likes

Call-by-value languages do not suffer from such problems, the mental model for evaluation is comparatively simple and it is easier to intuitively understand space consumption behavior.

I’m not sure I’d agree. It could also be that people in by-value languages just tend to write much simpler programs (with regards to resource consumption). The example in this blog post could also just be written like this (and, e.g., C programmers would write it like this, but with loops instead of recursion):

main = go 0 1 where
  go :: Int -> Int -> IO ()
  go n i
    | n == 10^8 = print i
    | i == 10^7 = go (n + 1) 1
    | otherwise = go (n + 1) (i + 1)

Even in by-need Haskell, it’s not hard to understand the space consumption behavior of this program. Haskell makes it easier to write more complicated programs, which are naturally harder to reason about.

3 Likes

This is not my experience.

I’ve seen memory issues at work that no one could figure out and our program was a boring backend. It often takes someone with intimate knowledge of GHC (which is ever-evolving and it’s hard to get a mental model of all the shenanigans regarding inlining etc.) or you end up hiring a consultant.

GHC does not have predictable performance. A GHC upgrade can easily break parts of your program in terms of performance if you do clever things.

2 Likes

@jaror wrote:

GHC tends to float out static things like that enumeration to the top level

I was originally believing this but don’t believe it anymore. Do you have a reference for it?
Since floating stuff around changes the resource semantics it should not be done automatically…


It’s documented in the GHC user’s guide:

  • 5.3. Optimisation (code improvement) — Glasgow Haskell Compiler 9.14.1 User's Guide

    -ffull-laziness
    Default: off but enabled by -O.
    Run the full laziness optimisation (also known as let-floating), which floats let-bindings outside enclosing lambdas, in the hope they will be thereby be computed less often. See Let-floating: moving bindings to give faster programs (ICFP’96). Full laziness increases sharing, which can lead to increased memory residency.

And a long standing issue:

It seems it may only work when there is an actual lambda, so I can ironically only reproduce it with your by-name version (could also be due to the cycle from base fusing away):

cycle' :: (() -> [a]) -> [a]
cycle' f = f () ++ cycle' f

main = print $ head $ drop 100000000 $ cycle' $ \ () -> [1..10000000]

Compiling that with -O introduces a top level binding like this in the generated core:

main16 :: [Integer]
main16 = cycle' main17

And the allocation statistics also reflect that (running with +RTS -s):

   6,320,050,792 bytes allocated in the heap
     931,703,296 bytes copied during GC
     234,643,312 bytes maximum residency (9 sample(s))
      40,194,192 bytes maximum slop
             537 MiB total memory in use (0 MiB lost due to fragmentation)

Oooops. Thanks for enlightening me!

Since cabal supplies -O1 by default, I was assuming in error that this is also ghc’s default, so I did not observe the harmful effects of the automatic lifting. I only tried ghc and ghc -O0, which are identical, as I now see.

I’ll update my blog post with the information that -fno-full-laziness is required.

I agree it’s highly dubious to apply it to data that is not known to be small. When it applies the only workable way of interpreting the size semantics of lazy data structures is “they are as big as they could possibly be”, which means it’s risky to ever use infinite lazy lists.

See also a question I asked about this almost 10 years ago: https://stackoverflow.com/questions/35115172/why-is-full-laziness-a-default-optimization

1 Like