Understanding lazy list traversal performance

rnorth · December 8, 2024, 6:10pm

Hello! I’m learning Haskell and trying to understand the performance of my programs.

I’m interested in a subset of unfoldr like functions where each call to the “unfolding function” generates a list and a new seed which depends on the final value of the list. The function f in the code below is like this. The function g computes the same thing, but directly generates the next seed instead using arithmetic.

When I compile (using ghc -O2) and run the code below it takes around 1.5s to run. With f replaced by g in the main function, it takes only 0.02s! I’m not sure why the performance difference is so large? In both cases I need to traverse the entire result list. With f I only need to generate one more element of the list, which should not be very expensive.

I hope this question is interesting and I would love to understand this performance difference.

import Data.List (iterate, break, unfoldr)

target :: Int
target = 10000

f :: Int -> Int -> ([Int], Int)
f x s = (l, head t - target)
    where (l, t) = break (> target) $ iterate (+ s) x

g :: Int -> Int -> ([Int], Int)
g x s = (takeWhile (<= target) $ iterate (+ s) x, s - ((target - x) `rem` s))

unfoldWith :: (a -> b -> (r, b)) -> [a] -> b -> [r]
unfoldWith f xs s = unfoldr f' (xs, s)
    where f' (x:xs, s) = (\(r, s') -> Just (r, (xs, s'))) $ f x s
          f' ([], _) = Nothing

main :: IO ()
main = print $ length . concat $ unfoldWith f [1..10000] 1

mchav · December 8, 2024, 6:31pm

I haven’t profiled this and could be wrong but I think the difference is:

Looking at the break function it’s probably harder for the compiler to turn that into a loop since it returns a sequence of tuples (which I think are lazy in their elements) all of which need to have their lists extracted and appended/evaluated. This probably makes a large number of partially evaluated functions that are resolved when you call length.

For g you’re constructing the tuple yourself and there aren’t really any partial evaluations building up. The compiler can generate the list in one pass (this might also be much easier to optimize) and the second element is a single number which isn’t too expensive to generate as well.

I’d run the two versions with memory profiling to confirm that this is a case of laziness leading to a larger memory footprint and thus a longer runtime.

jaror · December 8, 2024, 7:30pm

Part of the answer is that break does not fuse, so it will always allocate intermediate lists. I think your g function is able to fuse completely, which means that it is compiled to a tight non-allocating loop, which explains the speed difference.

I would have to take a bit more time analyzing this code to explain how the fusion does work for g and in particular through unfoldr as it seems to be happening.

rnorth · December 9, 2024, 10:31pm

Thanks for your comment. Yes, the issue is that the original function does not fuse properly. I did some reading and found out about how GHC is normally able to do “build-foldr” fusion, but seemingly not for this case.

Here is a slightly simpler case with the same pattern which I managed to get working. It is a computation that repeatedly generates lists based on the final value of a previous list. The “naive” version doesn’t fuse and it takes 15 seconds to run this example on my computer.

target :: Int
target = 100000

iter :: Int
iter = 500000000

follow :: (b -> ([a], b)) -> b -> [a]
follow gen seed = let
    go s = case gen s of ([], _) -> []
                         (xs, s') -> xs ++ go s'
    in go seed

gen :: Int -> ([Int], Int)
gen x = let
    values = takeWhile (<= target) (iterate (+ x) 0)
    in (values, last values + x - target)

-- 15 seconds!
main = print . last . take iter $ follow gen 1

Then I directly ensure that only one list is constructed. It’s much faster.

builder = let
    go seed x = if x <= target then x : go seed (x + seed)
                else go (x - target) 0
    in go 1 0

-- only 1.9s
main = print . last . take iter $ builder

Finally I directly apply the index function. It’s even faster but a little awkward to use.

builder' cons = let
    go seed x = if x <= target then x `cons` go seed (x + seed)
                else go (x - target) 0
    in go 1 0

index :: Int -> Int
index k = let
    f x acc n = if n == 0 then x else acc (n - 1)
    in builder' f (k - 1)

-- 0.3 seconds
main = print $ index iter