I have been teaching evaluation stategies for many years, advertising call-by-need as the Synthese of call-by-value and call-by-name: Call-by-value (These) might evaluate unused function arguments, call-by-name (Antithese) does not do such but might evaluate arguments several times, yet call-by-need evaluates each argument at most once, and only when needed. So it takes the best of two worlds, or does it?
I knew from programming practice that this is a gross oversimplification, and call-by-need complicates the semantics by making a heap indispensable, and there are all sorts of difficulties with space leaks through thunk retainment etc.
However, when studying space leaks in earnest for the first time, I was still surprised to come across a simple example where call-by-need is outperformed not by call-by-value but by the unlikely call-by-name. In this example, the “at most once” property of argument evaluation comes not as a blessing, but as a bane. The example is a Haskell one-liner:
main = print $ head $ drop (10^8) $ cycle [1..10^7]
Here, [1..10^7] is syntax for the generator enumFromTo 1 (10^7) that lazily produces the list [1,2,...,10^7].
This program swallows huge amounts of space, it constructs the whole list in memory.
Why is that?
I am kind of surprised your solution works. GHC tends to float out static things like that enumeration to the top level and then it still would allocate the whole thing. You’d have to disable full laziness to avoid that. Specifically, this program still allocates the whole list:
cycle :: (() -> [a]) -> [a]
cycle f = f () ++ cycle f
xs = [1..10^7]
main = print $ head $ drop (10^8) $ cycle $ \ () -> xs
If you want to prevent this from happening you can define a type of by-name streams like this:
data Stream a = forall s. MkStream !s (s -> Step s a)
data Step s a = Done | Yield a !s | Skip !s
enumFromToS :: Int -> Int -> Stream Int
enumFromToS a b = MkStream a step where
step x | x > b = Done
| otherwise = Yield x (x + 1)
cycleS :: Stream a -> Stream a
cycleS (MkStream s0 step) = MkStream s0 step' where
step' s = case step s of
Done -> Skip s0
Yield x s' -> Yield x s'
Skip s' -> Skip s'
data DropState a = DS !Int !a
dropS :: Int -> Stream a -> Stream a
dropS n (MkStream s0 step) = MkStream (DS n s0) step' where
step' (DS 0 s) = case step s of
Done -> Done
Yield x s' -> Yield x (DS 0 s')
Skip s' -> Skip (DS 0 s')
step' (DS n s) = case step s of
Done -> Done
Yield x s' -> Skip (DS (n - 1) s')
Skip s' -> Skip (DS n s')
headS :: Stream a -> Maybe a
headS (MkStream s0 step) = go s0 where
go s = case step s of
Done -> Nothing
Yield x _ -> Just x
Skip s' -> go s'
main = print $ headS $ dropS (10^8) $ cycleS $ enumFromToS 1 (10^7)
If you compile this with -O2 you get constant space and a minimal amount of total allocations. With -O1 you still get a lot of short-lived allocations.
Also, it should be noted that the by need version would be faster if computing the list elements was slower than allocating memory. That’s why GHC generally prefers memory allocation over arbitrary computation. For example, adding a little Fibonacci inspired computation in there makes by-need faster than by-name:
drop' :: Int -> [a] -> [a]
drop' 0 x = x
drop' _ [] = []
drop' n (!x:xs) = drop' (n - 1) xs
fib :: Int -> Int -> Int
fib c 0 = c
fib c 1 = c + 1
fib c n = fib c (n - 1) + fib c (n - 2)
main = print $ head $ drop' (10^8) $ cycle $ map (\c -> fib c 5) [1..10^7]
Call-by-value languages do not suffer from such problems, the mental model for evaluation is comparatively simple and it is easier to intuitively understand space consumption behavior.
I’m not sure I’d agree. It could also be that people in by-value languages just tend to write much simpler programs (with regards to resource consumption). The example in this blog post could also just be written like this (and, e.g., C programmers would write it like this, but with loops instead of recursion):
main = go 0 1 where
go :: Int -> Int -> IO ()
go n i
| n == 10^8 = print i
| i == 10^7 = go (n + 1) 1
| otherwise = go (n + 1) (i + 1)
Even in by-need Haskell, it’s not hard to understand the space consumption behavior of this program. Haskell makes it easier to write more complicated programs, which are naturally harder to reason about.
I’ve seen memory issues at work that no one could figure out and our program was a boring backend. It often takes someone with intimate knowledge of GHC (which is ever-evolving and it’s hard to get a mental model of all the shenanigans regarding inlining etc.) or you end up hiring a consultant.
GHC does not have predictable performance. A GHC upgrade can easily break parts of your program in terms of performance if you do clever things.
GHC tends to float out static things like that enumeration to the top level
I was originally believing this but don’t believe it anymore. Do you have a reference for it?
Since floating stuff around changes the resource semantics it should not be done automatically…
-ffull-laziness
Default: off but enabled by -O.
Run the full laziness optimisation (also known as let-floating), which floats let-bindings outside enclosing lambdas, in the hope they will be thereby be computed less often. See Let-floating: moving bindings to give faster programs (ICFP’96). Full laziness increases sharing, which can lead to increased memory residency.
It seems it may only work when there is an actual lambda, so I can ironically only reproduce it with your by-name version (could also be due to the cycle from base fusing away):
cycle' :: (() -> [a]) -> [a]
cycle' f = f () ++ cycle' f
main = print $ head $ drop 100000000 $ cycle' $ \ () -> [1..10000000]
Compiling that with -O introduces a top level binding like this in the generated core:
main16 :: [Integer]
main16 = cycle' main17
And the allocation statistics also reflect that (running with +RTS -s):
6,320,050,792 bytes allocated in the heap
931,703,296 bytes copied during GC
234,643,312 bytes maximum residency (9 sample(s))
40,194,192 bytes maximum slop
537 MiB total memory in use (0 MiB lost due to fragmentation)
Since cabal supplies -O1 by default, I was assuming in error that this is also ghc’s default, so I did not observe the harmful effects of the automatic lifting. I only tried ghc and ghc -O0, which are identical, as I now see.
I’ll update my blog post with the information that -fno-full-laziness is required.
I agree it’s highly dubious to apply it to data that is not known to be small. When it applies the only workable way of interpreting the size semantics of lazy data structures is “they are as big as they could possibly be”, which means it’s risky to ever use infinite lazy lists.