I wrote an article, "foldl
traverses with State
, foldr
traverses with anything", that explains the difference between foldl
, foldl'
, foldr
and foldM
and explains when you should use each.
Nice read! Feels like I learned why I use which fold
when
Side note: I think the foldl
argument of forStateTFromFoldlM
should be foldM
? In the foldM
section?
(And maybe it should be called forStateTFromFoldM
? Might be a typo?)
Nice read! Feels like I learned why I use which
fold
when
Thanks, Iām glad! That was the aim.
Side note
Thanks, fixed: Fix foldM example, thanks to Vlix Ā· tomjaguarpaw/H2@d6b1b45 Ā· GitHub
Quite illuminating, thanks. I have one gripe though:
Actually, no: I also think that Haskell is the worldās finest imperative programming language!
I think you should credit Lean and Koka here, both of which implement many more extensions to do-notation in order to implement things like early return
, break
and continue
as described in https://lean-lang.org/papers/do.pdf.
Lean version of (!?)
(Edit: As Tom points out later, this is not actually (!?)
, but Lean doesnāt natively feature linked lists AFAICT, only arrays, and (!?)
doesnāt need to loop there):
@[inline]
def findM? {Ī± : Type} {m : Type ā Type} [Monad m] (as : Array Ī±) (p : Ī± ā m Bool) : m (Option Ī±) := do
for a in as do
if (ā p a) then
return a
return none
@[inline]
def find? {Ī± : Type} (as : Array Ī±) (p : Ī± ā Bool) : Option Ī± :=
Id.run <| as.findM? p
As the paper details, under the hood this will compile to much the same Either
transformer as the code you gave, only this is much cleaner. Thus, Lean seems an finer language for imperative programming.
Interesting, thanks! Whatās the scope of return
in the Lean example? As far as my preferences go, to be āfinestā the return
has to be scoped in a way that survives refactoring across function boundaries. C and Python style returns donāt survive. Does the Lean one?
All control flow keywords are local to surrounding blocks. return to the enclosing do, break and continue to forā¦do (perhaps confusingly at first, this use of do is not a do block). So if you were to inline including do, itās semantics preserving. See bottom of page 11 of the paper.
Ah, OK, I see from page 10 that return e
translates to throw e
and so indeed it has the scoping behavior that I want. Great! On the other hand, one of the other properties that I desire is to be able to return
to any enclosing handler. You can in Haskell (in ExceptT
by applying the correct number of lift
s, in effect libraries perhaps more ergonomically). It seems that perhaps do
blocks in Lean are implicitly in ExceptT
, so maybe you can do the same there too. Iāll have to either download a Lean compiler sometime, or try it in an online REPL, if they have one.
Itās also possible to make the Haskell version a lot cleaner without adding more sugar:
findM :: Monad m => (a -> m Bool) -> [a] -> m (Maybe a)
findM p xs = runEarlyT do
for_ xs \x -> do
b <- lift (p x)
when b (earlyReturn (Just x))
pure Nothing
See the previous discussion: The ICFP'22 playlist has been published to YouTube - #19 by jaror
Edit: Hereās how one could write !?
:
(!?) :: [a] -> Int -> Maybe a
xs !? n = runEarly do
when (n < 0) $ earlyReturn Nothing
flip evalStateT n do
for_ xs \x -> do
k <- get
when (k == 0) $ lift $ earlyReturn $ Just x
put (k - 1)
pure Nothing
And maybe itās possible to clean it up more.
Before we start carefully scrutinising different implementations I should point out that the example @sgraf gave, which is āfind element satisfying a propertyā does not correspond to (!?)
, which is ālook up element at an indexā.
Does your site/wiki have an RSS/Atom feed?
Sadly not. I generate the site using hakyll
. Maybe it has functionality to generate RSS/Atom, but I donāt know.
It does! I followed the following tutorial to implement it:
foldrFromFor ::
(forall b f. Applicative f => [b] -> (b -> f ()) -> f ()) ->
forall a b.
(b -> a -> a) ->
a ->
[b] ->
a
foldrFromFor for_ f z bs =
runEndoApplicative z $ for_ bs $ \b -> mkEndoApplicative (f b)
Note that this definition does not use any f
other than Const (Endo a)
. We can relax it to
foldrFromForEndo ::
(forall a b. [b] -> (b -> Const (Endo a) ()) -> Const (Endo a) ()) ->
forall a b.
(b -> a -> a) ->
a ->
[b] ->
a
foldrFromForEndo for_ f z bs =
($ z) $ appEndo $ getConst $ for_ bs $ \b -> Const (Endo (f b))
Now compare it to foldl
definition:
foldlFromForState ::
(forall a b. [b] -> (b -> State a ()) -> State a ()) ->
forall a b.
(a -> b -> a) ->
a ->
[b] ->
a
Whatās exactly the difference between Const (Endo a) ()
and State a ()
? Both are essentially a -> a
, the only mismatch is the order of composition. Const (Endo a) ()
is a reverse State
monad. We can define foldl
in the same style as foldr
:
foldlFromForEndo ::
(forall a b. [b] -> (b -> Const (Dual (Endo a)) ()) -> Const (Dual (Endo a)) ()) ->
forall a b.
(a -> b -> a) ->
a ->
[b] ->
a
foldlFromForEndo for_ f z bs =
($ z) $ appEndo $ getDual $ getConst $ for_ bs $ \b -> Const (Dual (Endo (flip f b)))
Alternatively one can require for_ :: (forall b f. Applicative f => [b] -> (b -> f ()) -> f ())
here.
Yes, thatās right. This is explored a little in the āReferences and commentary: Monoidsā section. Note that foldlFromForState
can be adjusted to take an Applicative
-general argument, but forStateFromFoldl
canāt be adjusted to return an Applicative
-general result, so foldl
is not equivalent to traversing with anything.
EDIT: I endued up tweaking the section āReferences and commentary: Monoids" because there were a couple of bits around Const (Endo _)
which were a bit dubious.
I found the general rule about when to use foldr vs foldlā a bit vague:
According to the Principle of Least Power you should use
foldl'
in preference tofoldr
when you can.
In contrast, hereās a slightly different proposition by @lexi.lambda update custom column names on renaming/dropping columns by rakeshkky Ā· Pull Request #2933 Ā· hasura/graphql-engine Ā· GitHub (itās a good read!)
When the accumulation function is strict, use foldlā to consume the list in constant space, since the whole list is going to have to be traversed, anyway.
When the accumulation function is lazy in its second argument, use foldr to do work incrementally to improve streaming and work-saving.
Never use foldl or foldrā; theyāre always worse on lists.
Note that the above only applies for lists. So maybe your rule is more general.
Also interesting wrt foldr optimization: Neil Mitchell's Blog (Haskell etc): foldr under the hood
Iām going to admit this is going to take me a few reads to understand haha.
I had a no idea lookup in base was written that way. Why is it this way with foldr CPS and not the ez-pz:
0 !? (x:_) = Just x
_ !? [] = Nothing
n !? (_:xs) = (n-1) !? xs
Feel free to ask any questions in the meantime. Maybe I can change some things I wrote to make it more clear.
I had a no idea lookup in base was written that way. Why is it this way
Thatās a good question! Itās written that way so that the compiler can apply short cut fusion, a rewrite rule that leads to an optimization.
Thank you very much. Shortcut fusion is completely new to me so this is good reading.
The main takeaway Iām getting from the blog and wider reading is:
- Use
foldl'
when you must absolutely traverse the whole list, reason being you are very much interested in updating a state with each value and you want to do it in constant space. - Use
foldr
when your function is lazy in the second argument so you can return early.
The forStateFromFoldl
function was what was currently wrecking me.
What Iām reading is:
forStateFromFoldl ::
(forall a b. (a -> b -> a) -> a -> [b] -> a) ->
-- A function that will take a function, our accumulator (state), `a` and a list of `b` and return a new accumulator.
-- The inner function takes our accumlator, a value from our list of `b` and returns us a new accumulator
-- aka. foldl
forall a b.
[b] -> -- list of things to iterate through
(b -> State a ()) ->
-- A function that produces a new state from b
-- State a () is an alias for a function that goes ( a -> ( () , a ))
-- So really this: (b -> (a -> ( (), a))) where once we have made the State, we can call put to send what ever we have computed to the state
State a ()
forStateFromFoldl foldl bs f = do
z <- get -- Since the return value is a State we can use get to find the initial state supplied to function.
put (foldl g z bs) -- Fold through the bs and change our initial state to our result via function g, put the result into the state.
where
g a b = execState (f b) a
-- a is our accumulator; b is our value from the b-list we're working on
-- since f is a (b -> State a ()) function, we must use execState to return the new state for the foldl function to use with the next bs.
Now Iāve actually written it out I think it makes way more sense and is really cool .
If Iāve missed anything, please let me know!
If youāve miraculously discovered a case where
foldl
is better then please tell me because I would be astonished to learn about it.
Hereās my attempt: a function like find
but that searches from right to left through the list:
findR :: (a -> Bool) -> [a] -> Maybe a
findR f = foldl (\xs x -> if f x then Just x else xs) Nothing
If you use foldl'
then you might be running the possibly expensive function f
more than necessary. Although admittedly the strict variant can be faster if f
is cheap and the list is long:
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n - 1) + fib (n - 1)
main = defaultMain
[ bench "findR expensive" $ nf (findR (even . fib)) [20,19..1 :: Int]
, bench "findR' expensive" $ nf (findR' (even . fib)) [20,19..1 :: Int]
, bench "findR cheap" $ nf (findR even) [1..10000 :: Int]
, bench "findR' cheap" $ nf (findR' even) [1..10000 :: Int]
]
findR expensive: OK
87.6 ns Ā± 7.7 ns, 679 B allocated, 0 B copied, 6.0 MB peak memory
findR' expensive: OK
784 ns Ā± 52 ns, 327 B allocated, 0 B copied, 6.0 MB peak memory
findR cheap: OK
50.3 Ī¼s Ā± 1.1 Ī¼s, 313 KB allocated, 12 KB copied, 7.0 MB peak memory
findR' cheap: OK
41.3 Ī¼s Ā± 3.4 Ī¼s, 78 KB allocated, 2 B copied, 7.0 MB peak memory