Beginner:: I don't fully understand this implementation of foldr1

In the book

The author presents their own custom version of the head function called head’ .
He writes it like

head' :: [a] -> a
head' foldr (\x _ -> x)

My first question is why not use heald’ foldl (\x _ -> x) ?
I was surprised to find out that both foldr1 and foldl1 produce that same result.

Here is how I understand this function.
The head function implements foldr1. Foldr1 starts from the right. So assuming we have a list like :
It will start at the number 4 ?

I am not entirely sure how I would represent it using a diagram like this that I found online

But essentially I was thinking about it like

Obviously, the above is not the correct way to represent it. It also does not get you anywhere near the actual answer.

How can I understand this better ?

So let’s contrast the two examples given back-to-back in the book:

head' :: [a] -> a  
head' = foldr1 (\x _ -> x)  
last' :: [a] -> a  
last' = foldl1 (\_ x -> x)  

these two are really the exact same implementation with one using foldl1 and one using foldr1 (the lambdas look different, but that’s just because foldl1 and foldr1 take their parameters in different orders).
The x parameter is the “element” taken from the list, while the _ is the stateful accumulator that gets iteratively updated. Now this is a very unusual usage of folds, since you’re throwing away the accumulated value, and typically folds are used specifically for the accumulated value.
So if I take the head' function, and apply it to [1,2,3,4], it’s going to produce
f 1 (f 2 (f 3 4)), with f = \x _ -> x. Now if Haskell were a strict language, it would start with f 3 4 = 3, then evaluate f 2 3 = 2, and then finally evaluate f 1 2 = 1, your final answer. So it does “start from the right,” but at each step it just throws away everything it calculated so far. The final invocation after it reached the left of the list keeps the head and throws out the rest.

But Haskell’s not strict, it’s lazy. That means that f 1 (f 2 (f 3 4)) looks at the definition of f = \x _ -> x, and it knows that it doesn’t need to bother evaluating f 2 (f 3 4) in order to give you your answer, you can just return the 1 straight away.

If we look at the last' function, and apply it to [1,2,3,4], you’re going to end up with f (f (f 1 2) 3) 4. Again, if Haskell were strict, it would need to start with that f 1 2, which would be the left side of the list, which is why it kind of works to think of it as left-to-right. I’ll leave it as an exercise to the reader why the last' implementation works, and what the laziness of Haskell means in this case.

Does that help? Let us know if anything is still unclear!

Okay, I think I am starting to understand. I didn’t realize the implementation of lazy. Please correct me if I am wrong but this is how I understand it.

Because Haskell is lazy, the following situation;
f 1 (f 2 (f 3 4))
works because even though f 3 4 would be implemented first since it takes precedence in a strict language. However Haskell is a lazy language meaning if looks at our Lambda function and evaluates as much as it needs to. Haskell looks at our definition f = \x _ -> x and basically states that we don’t need to work that hard since were going to throw away almost all of it anyway let’s just return the last 1.

However this changes when we do need to use the accumulator. For example. if we take a closer look at an example from the book .

flip (:) (flip (:) (flip (:) (flip (:) [] 3) 4) 5) 6

Here Haskell asks itself “how far do we actually need to go to get the result ?”. In this case we need to go all the way in so we start with. This is because we actually need the accumulation of the array.
(flip (:)[] 3)

I’ll leave it as an exercise to the reader why the last' works ?

It works because Haskell checks how far we need to go to get the answer. It looks like we are just throwing the accumulator away. we can return the outermost value.

Am I close ?

Yeah, that all sounds right to me! Haskell’s laziness means it just delays any computation until the last possible moment. So if I call f (g x) (h x) where f = \x _ -> x, in a strict language, it evaluates the arguments before entering function f, and so would evaluate (g x) and (h x) and pass the resulting values to function f. In Haskell, it just keeps the arguments as unevaluated (g x) and (h x), and the (h x) would never get evaluated. Actually, just from the code here, the (g x) wouldn’t even be evaluated yet either! We don’t need to know what (g x) equals to know that f (g x) (h x) = (g x). If the code that called f in the first place never uses the value of it, we might never evaluate (g x) or (h x)

So it’s less that Haskell “makes a plan” for what needs to be evaluated and what doesn’t - it just doesn’t evaluate anything at all until the last possible moment.

In most circumstances, Haskell’s laziness doesn’t at all affect the return value of a function, it just affects the time and memory performance. Unless the laziness keeps you from evaluating something infinite or something that throws an error, you’d get the same answer if Haskell were strict. For example, in Haskell, I can say foldr1 (\x _ -> x) [0..5] to get 0. In a strict language like F#, I could say Seq.reduceBack (fun x _ -> x) [0..5], and also get back 0. But in Haskell I can say foldr1 (\x _ -> x) [0..] and still get back 0, but in F# if I say Seq.reduceBack (fun x _ -> x) (Seq.initInfinite id), it will hang there forever, since it’s trying to start at the end of an infinite list and work its way to the beginning. So in F#, since it’s strict, that would be a terrible implementation for head, since it needs to start at the end of the list and walk it’s way backwards to get the answer, which is a ton of extra work, but Haskell’s laziness doesn’t alter the output.

With laziness,

   foldr1 (\x _ -> x) (1 : 2 : etc doesn't matter now does it?)
-> (\x _ -> x) 1 (foldr (\x _ -> x) (2 : etc))
-> 1

Therefore, so-called f 1 (f 2 (f 3 (f 4 (f 5)))) does not even exist, ever.

As another example, foldr1 (&&) (False : repeat True) takes constant time (therefore constant space too).