Foldl traverses with State, foldr traverses with anything

Nice read! Feels like I learned why I use which fold when :sweat_smile:

Thanks, I’m glad! That was the aim.

Side note

Thanks, fixed: Fix foldM example, thanks to Vlix · tomjaguarpaw/H2@d6b1b45 · GitHub

1 Like

Quite illuminating, thanks. I have one gripe though:

Actually, no: I also think that Haskell is the world’s finest imperative programming language!

I think you should credit Lean and Koka here, both of which implement many more extensions to do-notation in order to implement things like early return, break and continue as described in https://lean-lang.org/papers/do.pdf.
Lean version of (!?) (Edit: As Tom points out later, this is not actually (!?), but Lean doesn’t natively feature linked lists AFAICT, only arrays, and (!?) doesn’t need to loop there):

@[inline]
def findM? {α : Type} {m : Type → Type} [Monad m] (as : Array α) (p : α → m Bool) : m (Option α) := do
  for a in as do
    if (← p a) then
      return a
  return none

@[inline]
def find? {α : Type} (as : Array α) (p : α → Bool) : Option α :=
  Id.run <| as.findM? p

As the paper details, under the hood this will compile to much the same Either transformer as the code you gave, only this is much cleaner. Thus, Lean seems an finer language for imperative programming.

3 Likes

Interesting, thanks! What’s the scope of return in the Lean example? As far as my preferences go, to be “finest” the return has to be scoped in a way that survives refactoring across function boundaries. C and Python style returns don’t survive. Does the Lean one?

All control flow keywords are local to surrounding blocks. return to the enclosing do, break and continue to for…do (perhaps confusingly at first, this use of do is not a do block). So if you were to inline including do, it’s semantics preserving. See bottom of page 11 of the paper.

Ah, OK, I see from page 10 that return e translates to throw e and so indeed it has the scoping behavior that I want. Great! On the other hand, one of the other properties that I desire is to be able to return to any enclosing handler. You can in Haskell (in ExceptT by applying the correct number of lifts, in effect libraries perhaps more ergonomically). It seems that perhaps do blocks in Lean are implicitly in ExceptT, so maybe you can do the same there too. I’ll have to either download a Lean compiler sometime, or try it in an online REPL, if they have one.

It’s also possible to make the Haskell version a lot cleaner without adding more sugar:

findM :: Monad m => (a -> m Bool) -> [a] -> m (Maybe a)
findM p xs = runEarlyT do
  for_ xs \x -> do
    b <- lift (p x)
    when b (earlyReturn (Just x))
  pure Nothing

See the previous discussion: The ICFP'22 playlist has been published to YouTube - #19 by jaror

Edit: Here’s how one could write !?:

(!?) :: [a] -> Int -> Maybe a
xs !? n = runEarly do
  when (n < 0) $ earlyReturn Nothing
  flip evalStateT n do
    for_ xs \x -> do  
      k <- get
      when (k == 0) $ lift $ earlyReturn $ Just x
      put (k - 1)
  pure Nothing

And maybe it’s possible to clean it up more.

2 Likes

Before we start carefully scrutinising different implementations I should point out that the example @sgraf gave, which is “find element satisfying a property” does not correspond to (!?), which is “look up element at an index”.

2 Likes

Does your site/wiki have an RSS/Atom feed?

Sadly not. I generate the site using hakyll. Maybe it has functionality to generate RSS/Atom, but I don’t know.

1 Like

It does! I followed the following tutorial to implement it:

4 Likes

Thanks, noted that here: Add an RSS feed · Issue #4 · tomjaguarpaw/H2 · GitHub

2 Likes
foldrFromFor ::
  (forall b f. Applicative f => [b] -> (b -> f ()) -> f ()) ->
  forall a b.
  (b -> a -> a) ->
  a ->
  [b] ->
  a
foldrFromFor for_ f z bs =
  runEndoApplicative z $ for_ bs $ \b -> mkEndoApplicative (f b)

Note that this definition does not use any f other than Const (Endo a). We can relax it to

foldrFromForEndo ::
  (forall a b. [b] -> (b -> Const (Endo a) ()) -> Const (Endo a) ()) ->
  forall a b.
  (b -> a -> a) ->
  a ->
  [b] ->
  a
foldrFromForEndo for_ f z bs =
  ($ z) $ appEndo $ getConst $ for_ bs $ \b -> Const (Endo (f b))

Now compare it to foldl definition:

foldlFromForState ::
  (forall a b. [b] -> (b -> State a ()) -> State a ()) ->
  forall a b.
  (a -> b -> a) ->
  a ->
  [b] ->
  a

What’s exactly the difference between Const (Endo a) () and State a ()? Both are essentially a -> a, the only mismatch is the order of composition. Const (Endo a) () is a reverse State monad. We can define foldl in the same style as foldr:

foldlFromForEndo ::
  (forall a b. [b] -> (b -> Const (Dual (Endo a)) ()) -> Const (Dual (Endo a)) ()) ->
  forall a b.
  (a -> b -> a) ->
  a ->
  [b] ->
  a
foldlFromForEndo for_ f z bs =
  ($ z) $ appEndo $ getDual $ getConst $ for_ bs $ \b -> Const (Dual (Endo (flip f b)))

Alternatively one can require for_ :: (forall b f. Applicative f => [b] -> (b -> f ()) -> f ()) here.

5 Likes

Yes, that’s right. This is explored a little in the “References and commentary: Monoids” section. Note that foldlFromForState can be adjusted to take an Applicative-general argument, but forStateFromFoldl can’t be adjusted to return an Applicative-general result, so foldl is not equivalent to traversing with anything.

EDIT: I endued up tweaking the section “References and commentary: Monoids" because there were a couple of bits around Const (Endo _) which were a bit dubious.

1 Like

I found the general rule about when to use foldr vs foldl’ a bit vague:

According to the Principle of Least Power you should use foldl' in preference to foldr when you can.

In contrast, here’s a slightly different proposition by @lexi.lambda update custom column names on renaming/dropping columns by rakeshkky · Pull Request #2933 · hasura/graphql-engine · GitHub (it’s a good read!)

  1. When the accumulation function is strict, use foldl’ to consume the list in constant space, since the whole list is going to have to be traversed, anyway.

  2. When the accumulation function is lazy in its second argument, use foldr to do work incrementally to improve streaming and work-saving.

  3. Never use foldl or foldr’; they’re always worse on lists.

Note that the above only applies for lists. So maybe your rule is more general.


Also interesting wrt foldr optimization: Neil Mitchell's Blog (Haskell etc): foldr under the hood

5 Likes

I’m going to admit this is going to take me a few reads to understand haha.

I had a no idea lookup in base was written that way. Why is it this way with foldr CPS and not the ez-pz:

0 !? (x:_) = Just x
_ !? [] = Nothing
n !? (_:xs) = (n-1) !? xs
1 Like

Feel free to ask any questions in the meantime. Maybe I can change some things I wrote to make it more clear.

I had a no idea lookup in base was written that way. Why is it this way

That’s a good question! It’s written that way so that the compiler can apply short cut fusion, a rewrite rule that leads to an optimization.

1 Like

Thank you very much. Shortcut fusion is completely new to me so this is good reading.

The main takeaway I’m getting from the blog and wider reading is:

  1. Use foldl' when you must absolutely traverse the whole list, reason being you are very much interested in updating a state with each value and you want to do it in constant space.
  2. Use foldr when your function is lazy in the second argument so you can return early.

The forStateFromFoldl function was what was currently wrecking me.

What I’m reading is:

forStateFromFoldl ::
  (forall a b. (a -> b -> a) -> a -> [b] -> a) -> 
-- A function that will take a function, our accumulator (state), `a` and a list of `b`  and return a new accumulator.
-- The inner function takes our accumlator, a value from our list of `b` and returns us a new accumulator
-- aka. foldl
  forall a b.
  [b] -> -- list of things to iterate through
  (b -> State a ()) ->
-- A function that produces a new state from b
-- State a () is an alias for a function that goes ( a -> ( () , a ))
-- So really this: (b -> (a -> ( (),  a))) where once we have made the State, we can call put to send what ever we have computed to the state
  State a ()
forStateFromFoldl foldl bs f = do
  z <- get -- Since the return value is a State we can use get to find the initial state supplied to function.
  put (foldl g z bs) -- Fold through the bs and change our initial state to our result via function g, put the result into the state.
  where
    g a b = execState (f b) a
-- a is our accumulator; b is our value from the b-list we're working on
-- since f is a (b -> State a ()) function, we must use execState to return the new state for the foldl function to use with the next bs.

Now I’ve actually written it out I think it makes way more sense and is really cool :smiley: .

If I’ve missed anything, please let me know!

1 Like

If you’ve miraculously discovered a case where foldl is better then please tell me because I would be astonished to learn about it.

Here’s my attempt: a function like find but that searches from right to left through the list:

findR :: (a -> Bool) -> [a] -> Maybe a
findR f = foldl (\xs x -> if f x then Just x else xs) Nothing

If you use foldl' then you might be running the possibly expensive function f more than necessary. Although admittedly the strict variant can be faster if f is cheap and the list is long:

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n - 1) + fib (n - 1)

main = defaultMain
  [ bench "findR  expensive" $ nf (findR  (even . fib)) [20,19..1 :: Int]
  , bench "findR' expensive" $ nf (findR' (even . fib)) [20,19..1 :: Int]
  , bench "findR  cheap" $ nf (findR  even) [1..10000 :: Int]
  , bench "findR' cheap" $ nf (findR' even) [1..10000 :: Int]
  ]
  findR  expensive: OK
    87.6 ns ± 7.7 ns, 679 B  allocated,   0 B  copied, 6.0 MB peak memory
  findR' expensive: OK
    784  ns ±  52 ns, 327 B  allocated,   0 B  copied, 6.0 MB peak memory
  findR  cheap:     OK
    50.3 μs ± 1.1 μs, 313 KB allocated,  12 KB copied, 7.0 MB peak memory
  findR' cheap:     OK
    41.3 μs ± 3.4 μs,  78 KB allocated,   2 B  copied, 7.0 MB peak memory

Thanks. Would you mind adding the following to your benchmarks?

findR'' f =
  foldr (flip (\xs x -> if f x then Just x else xs)) Nothing . reverse

Oddly enough slightly better than findR:

  findR'' expensive: OK
    79.5 ns ± 7.8 ns, 519 B  allocated,   0 B  copied, 6.0 MB peak memory
  findR'' cheap:     OK
    46.3 μs ± 4.1 μs, 234 KB allocated, 6.7 KB copied, 7.0 MB peak memory

But what’s the difference between foldl and \k z -> foldr (flip k) z . reverse?