Thanks. Would you mind adding the following to your benchmarks?
findR'' f =
foldr (flip (\xs x -> if f x then Just x else xs)) Nothing . reverse
Thanks. Would you mind adding the following to your benchmarks?
findR'' f =
foldr (flip (\xs x -> if f x then Just x else xs)) Nothing . reverse
Oddly enough slightly better than findR
:
findR'' expensive: OK
79.5 ns Ā± 7.8 ns, 519 B allocated, 0 B copied, 6.0 MB peak memory
findR'' cheap: OK
46.3 Ī¼s Ā± 4.1 Ī¼s, 234 KB allocated, 6.7 KB copied, 7.0 MB peak memory
But whatās the difference between foldl
and \k z -> foldr (flip k) z . reverse
?
Using reverse
breaks fusion:
findR expensive fused: OK
55.6 ns Ā± 3.5 ns, 671 B allocated, 0 B copied, 7.0 MB peak memory
findR cheap fused: OK
24.0 Ī¼s Ā± 1.8 Ī¼s, 312 KB allocated, 12 KB copied, 7.0 MB peak memory
findR' expensive fused: OK
691 ns Ā± 55 ns, 645 B allocated, 0 B copied, 7.0 MB peak memory
findR' cheap fused: OK
19.1 Ī¼s Ā± 1.6 Ī¼s, 156 KB allocated, 9 B copied, 7.0 MB peak memory
findR'' expensive fused: OK
238 ns Ā± 21 ns, 2.0 KB allocated, 0 B copied, 7.0 MB peak memory
findR'' cheap fused: OK
130 Ī¼s Ā± 6.8 Ī¼s, 937 KB allocated, 44 KB copied, 7.0 MB peak memory
Thatās when I add these benchmarks:
, bench "findR expensive fused" $ nf (\(x,y,z) -> findR (even . fib) (enumFromThenTo x y z)) (20,19,1 :: Int)
, bench "findR cheap fused" $ nf (findR even . uncurry enumFromTo) (1,10000 :: Int)
whatās the difference between
foldl
and\k z -> foldr (flip k) z . reverse
?
foldl
builds a closure of size O(n) before proceeding to evaluate it. foldr ... . reverse
works in constant space (assuming the list can be garbage collected as it is reversed).
Taken literally my statement is, sadly, obviously false. I would like to find a criterion that distinguishes āgenuineā uses of foldl
from uses that āshould beā a foldr
on the reversed list, but Iām not sure what that criterion should be. My belief is still that āgenuineā uses of foldl
should always be foldl'
instead.
This benchmark seems to contradict you:
{-# NOINLINE xs #-}
xs :: [Int]
xs = [1..10000]
triviall, trivialr :: [a] -> ()
triviall = foldl const ()
trivialr = foldr (flip const) () . reverse
{-# NOINLINE triviall #-}
{-# NOINLINE trivialr #-}
main = defaultMain
[ bench "trivial foldl" $ whnf triviall xs
, bench "trivial foldr.reverse" $ whnf trivialr xs
]
trivial foldl: OK
226 Ī¼s Ā± 17 Ī¼s, 390 KB allocated, 19 KB copied, 7.0 MB peak memory
trivial foldr.reverse: OK
67.3 Ī¼s Ā± 6.2 Ī¼s, 234 KB allocated, 6.7 KB copied, 7.0 MB peak memory
It does allocate slightly less, but it still allocates.
Oh foldr ... . reverse
definitely allocates. It allocates an entire new list! The question is the maximum residency. In principle foldr ... . reverse
can work in constant extra space, foldl
always uses O(n) extra space.
Ah, how about this criterion: I donāt know of a use of foldl
that shouldnāt be either foldl',
or foldr ... . reverse
. It be interested know if there are any counterexamples. I can just about believe that there are counterexamples for delicate performance reasons, but beyond that I feel confident that there arenāt.
Iām still not convinced so I did some heap profiling and got these pictures: (nevermind this, I was doing something wrong)
Iām still not convinced
Convinced about what? If you do an evaluation by hand with pen and paper I think youāll be convinced about the overall principle. What GHC actually does in practice is another matter, and something I know less about.
I found the general rule about when to use foldr vs foldlā a bit vague
According to the Principle of Least Power you should use
foldl'
in preference tofoldr
when you can.
@hasufell, yes, perhaps I should have connected the dots a bit more explicitly here. What I mean by āwhen you canā is āwhen you want to do what foldl'
doesā according to the article, that is, traverse a list with (only) a state.
Thanks for linking Alexisās post. Iāll link to it from my article. My problem with her rules of thumb is that they presuppose that you already have an (āaccumulationā) function (which is also a misnomer: in the case of foldr
it doesnāt accumulate). That it, itās a rule that you can use if you start with a function, to determine which fold to use with that function. By contrast, Iām trying to present a rule that you can use if you start with a problem, to determine which fold can solve your problem. If your problem is ātraverse with (only) a stateā-shaped then the answer is foldl'
. If itās ātraverse with anything elseā-shaped then the answer is foldr
(or in either case you could just use for_
). Does that explanation help? If so I wonder if I can incorporate it into the article somehow, or perhaps write another article with more examples.
Hi @tobz619, yes, your description of how forStateFromFoldl
works is correct.
Use
foldl'
when you must absolutely traverse the whole list, reason being you are very much interested in updating a state with each value and you want to do it in constant space.
Yes, or you could just say āUse foldl'
when youāre traversing the list with (only) a stateā.
Use
foldr
when your function is lazy in the second argument ā¦
Iām trying to get away from talking about (accumulator) functions because that presupposes one already has one, and if one does then one is likely to already know if one needs foldl
or foldr
. Instead, Iād say āUse foldr
when youāre traversing a list in any way that requires having keeping track of (only) a stateā.
Thatās a good idea, working through the examples manually sheds more light on it:
foldr k z = go where
go [] = z
go (x:xs) = k x (foldr k z xs)
reverse l = go l [] where
rev [] a = a
rev (x:xs) a = rev xs (x:a)
const' _ y = y
foldr const' () (reverse (1:2:3:[]))
go (reverse (1:2:3:[]))
go (rev (1:2:3:[]) [])
go (rev (2:3:[]) (1:[]))
go (rev (3:[]) (2:1:[]))
go (rev [] (3:2:1:[]))
go (3:2:1:[])
const' 3 (go (2:1:[]))
go (2:1:[])
const' 2 (go (1:[]))
go (1:[])
const' 1 (go [])
go []
()
foldl k z = go z where
go a [] = a
go a (x:xs) = go (k a x) xs
foldl const () (1:2:3:[])
go () (1:2:3:[])
go (const () 1) (2:3:[])
go (const (const () 1) 2) (3:[])
const (const (const () 1) 2) 3
const (const () 1) 2
const () 1
()
Conclusion: both approaches work in constant (additional) space! And both approaches do O(n) allocations (closures or cons cells).
So your statement:
Is technically true, but it seems to imply that foldl
uses more space than foldr ... . reverse
, which is not true (assuming cons cells and closures are the same size, which is not quite true in practice).
Ah yes, I see this is rather subtle. foldl
builds a closure of size O(n) before proceeding to evaluate it, reverse
builds a list of size O(n) before proceeding with foldr
. I considered the former āextraā but not the latter, but youāve convinced me thatās not a valid thing to do.
So yes, their operation ought to be roughly the same in terms of allocations and when memory is freed. Perhaps in foldl
's favour, it āfusesā the reverse
with the operation k
. Iām not sure if GHCās foldr
fusion does this automatically for foldr
-and-reverse
(because reverse
is written in terms of foldl
, which is written in terms of foldr
). So perhaps I ought to say "foldl
might be useful in some cases where you want to fuse foldr
and reverse
, even though writing the latter is clearer. I donāt know yet whether the latter optimizes to the former anyway.".
Thatās only in the report version (toggled with the USE_REPORT_PRELUDE
flag). The definition I show above is the definition that GHC usually uses.
And foldr _ _ (foldl _ _ _)
doesnāt fuse. It would have to use a build
in between, which the report version doesnāt do.
And we can see that no fusion is happening in this benchmark I showed: Foldl traverses with State, foldr traverses with anything - #23 by jaror
Oh, so it is. I wonder why that is. I donāt see any benefit to writing it out explicitly, given that you could just inline foldl
. Are there cases where GHC.List.foldl
is worse than writing it out by hand?
In any case, Iām happy to say I understand now that foldl
might in principle be a useful optimization of foldr ... . reverse
, although I donāt understand the circumstances in which it would be better, and I would prefer it was called fusedFoldrReverse
! Thanks for working it through with me, @jaror.
I am (or was) very fascinated with the foldl-from-foldr old riddle.
I disagree with the pair of sentiments (the solution is difficult to understand, the composition foldlFromForState. forFromFoldr
is mechanical and needs no brain power)!
In one direction, the role of Const (Endo _)
and why/how it gets foldrFromFor
to work is equally difficult/easy to understand. Anything you say to explain that, I can translate it to explain foldl-to-foldr directly.
In the other direction, the only non-mechanical ācreativeā brain power needed to solve foldl-from-foldr is a clever change in the argument order. Then the rest is mechanically recognizable as following the foldr idiom.
altFoldl :: (b -> a -> b) -> [a] -> (b -> b)
altFoldl op = go
where
go [] = \a -> a
go (x:xs) = \a -> (go xs) (op a x)
= (\x r -> \a -> r (op a x)) x (go xs)
ā“
altFoldl op = foldr funny id
where
funny = \x r a -> r (op a x)
-- The following massage will become useful below.
= \x r -> r . (flip op x)
In fact, learning from @Bodigrimās point, the central idea is Dual Endo
:
absFoldl :: (b -> a -> b) -> [a] -> Dual (Endo b)
absFoldl op = foldr endure mempty
where
endure = \x r -> Dual (Endo (flip op x)) <> r
Next project for you! Instead of for_
involving both Foldable and Applicative, can you stick to Foldable alone and make foldlFromFoldMap
, foldrFromFoldMap
, and foldMapFromFoldr
? They will show much more clearly the central role of Endo
(and Dual Endo
āyou can basically look at absFoldl
above and extract the foldMap
version) without the Const
dead weight.