Scrap your iteration combinators

My latest article, “Scrap your iteration combinators”. I would be interested to know what people think!

7 Likes

At least for foldl', make sure to use the strict state monad, otherwise the put $! s' does nothing! If I compile your definition with the lazy state monad, ghc -O -ddump-simpl -dsuppress-all shows a definition equivalent to foldl.

The mapAccumR function … applies a function to each element of a structure, passing an accumulating parameter from right to left

(“Right” and “left” here really mean “end” and “start”, but because we write English from left to right the we call the last element of a list the “rightmost” one and the first the “leftmost” one.)

Implicit here is that you’re only thinking of cons lists.

For snoc lists, there are arguments both ways for which is the “start” and “end”: some may think about reading from left to right, or others about starting from the head. And snoc lists swap the roles of foldl and foldr. “Never use foldr!” is now a thing you can say.

It’s also interesting to compare the various left and right folds on vectors. Unlike the situation with lists, we can find uses for both foldl and foldr on vectors.

Right. Control.Monad.Trans.State.Strict is the one linked from “becomes the “state parameter” of a State monad operation”. Perhaps I should have made it more clear, but since I’ve never found any practical reason to use any of the ....Lazy monads I think it’s better to simply always assume ....Strict.

That’s right, I’m thinking of the lists that Haskell has by default, whose syntax is built in, and which all Haskellers assume you’re thinking of when you say “list”!

And that fact that we even have a notion of snoc list is yet another symptom of this dubious “left/right” nomenclature.

Perhaps I should have made it more clear, but since I’ve never found any practical reason to use any of the ....Lazy monads I think it’s better to simply always assume ....Strict.

I agree with that stance, but it’s worth being extra explicit about it because if a reader is not already aware of it, it’s easy to naively import Control.Monad.Trans.State which reexports Lazy.

3 Likes

OK, thanks for the suggestion. I’ve added a clarification.

1 Like

These are interesting examples to think about, but I’m less convinced that this is a good idea to apply everywhere. It’s not surprising that one can use overpowered constructs (Applicative/Monad) to solve simple tasks. But should they? I’m certainly not replacing mapMaybe with a streaming library and nested for-loops.

6 Likes

Can you elaborate on why you consider Applicative/Monad overpowered, and why you wouldn’t use a streaming library and nested for-loops to replace mapMaybe?

1 Like

I agree with this.
I think the issue with using for_ is that you are making code more polymorphic, which means that whoever is reading your code needs to run a typechecker in their head to figure out which traversal is actually being called.
I find that this section of the GHC style guide expresses it well: coding style · Wiki · Glasgow Haskell Compiler / GHC · GitLab

4 Likes

The difference is in the accumulating parameter.

Basic folds are nice when there is no extra state to talk about, e.g. collapsing a data structure to a single simple value (or a list), or purging Nothings. The definitions are obvious and the implementation is efficient.

Threading product type updates over a list fold on the other hand might as well just be performed imperatively, its functional counterpart has no redeeming qualities. Folding here introduces a state blob, which then makes the code harder to read because this blob has to be reassembled on every step. Not to mention the reassembly is inefficient garbage-wise and trivial to screw up strictness-wise.

2 Likes

Aha! foldl' already has this problem, right? Because you don’t know what Foldable you’re iterating over nor the element type nor container type, without running the type checker in your head.

Or maybe those ones are OK, but there’s a particular issue with Monad type parameters? In which case foldM has this problem too. Perhaps we should have foldEither, foldIO, etc, so make that clearer? In the extend example in the article, the foldM was over Either. Perhaps using this would have been clearer:

foldEither :: (b -> a -> Either e b) -> b -> [a] -> Either e b

There’s another possibility: type applications. In the original extend we could have written

foldM @[] @(Either Conflict)

Then there’s no ambiguity. But then we could equally have written this in the final extend:

for_ @[] @(StateT PPreAssignment (Either Conflict)) newactives

Then there’s no need to run a type checker in your head.

Does this resolve your concern?

1 Like

This solves the problem of having to run the type checker in your mind, but it reveals another one: the for_ version is much more verbose.

I’m still on the fence which to prefer.

1 Like

Yeah, but only because foldM is already a “shorthand”. If reducing verbosity is my goal then I can flip type parameters and write

myFor_ @(StateT _ _) 

and this already specifies the type as precisely as foldM. It only gets more precise as I fill in the holes.

But it is still 4x as long.

How about

f @(S _ _)

That’s about as good as I can do!

That’s nice and compact, but now you’ve given up readability (compared to foldM).

Sure, for people who already know what foldM means (or at least “fold”). For anyone who doesn’t I think they’re both equally unreadable :slight_smile:

It’s a fun exercise. I prefer the rule of least power (prefer the least generic thing until needed).

I wanted to compare the current mapMaybe with

mapMaybe :: (a -> Maybe b) -> [a] -> [b]
mapMaybe f as =
  toList $
    for_ as $ \a -> do
      for_ (f a) $ \b ->
        yield b

but it doesn’t compile?

      Expected: Streaming.Internal.Stream (Data.Functor.Of.Of b) m0 b
        Actual: Streaming.Internal.Stream (Data.Functor.Of.Of b) m0 ()

Can I get a minimal example? Maybe I’m using the wrong for_?

1 Like

I don’t understand the error, but you can find all the code in the post here:

and you can load it in GHCi with something like

cabal repl -z -b streaming 
...
ghci> :l code/fold.hs 

Sure, but there are monomorphic versions of foldl', and that’s what I often use in practice.
FWIW using the polymorphic version of foldl' isn’t too bad because they are relatively easy to reason about. They should all be the same as doing toList first and then doing foldl'. It’s the sort of thing that I’ve committed to memory, so I don’t struggle to reason about it. But my memory is limited, and my ability remember potentially surprising instance definitions is especially limited.

There are two steps that I have to do to understand this code.

  1. I need to figure out the types involved, which TypeApplications resolve
  2. I need to figure out the meanings of the typeclass instances.

For monad transformers that can be tricky! In this example I need to see how StateT and Either interact. I would need to potentially open up two or more definitions and see how all of that code interacts. This is all doable and maybe in the grand scheme of things not that difficult, but this is taking up brain-space and energy that could be used for something else.

Of course the converse – lots and lots of monomorphic functions – is also a problem. You can sometimes use polymorphism to solve an N*M issue very effectively.

It’s all a matter of balancing trade-offs, but in general going for something more monomorphic is often the right call as a rule of thumb.

Monomorphic in the container type presumably, not monomorphic in the element or state types. Would it be useful to you to have a fully monomorphic foldlListIntElementsBoolState'? If not, why not? Do you have some rule you can use to tell you where to draw the line?

Interesting, because I have the same reasoning but reach the opposite conclusion. It’s exactly because my memory is limited that I’d rather remember what a small number of composable abstractions like StateT, EitherT and Stream do than a somewhat larger number of non-composable abstractions like foldl, foldl', mapAccumL, loopM, etc. do. With the former I can reason my way to building programs by combining them. With the latter I can’t.

(In a sense it’s even easier for me since I use bluefin, not streaming and transformers. All the effects already play nicely together by default, without causing any confusion.)

In this example I need to see how StateT and Either interact. I would need to potentially open up two or more definitions and see how all of that code interacts.

I suppose so, but have you never wondered how the m parameter interacts with the threaded state in foldM? If not, why would you wonder about the former but not the latter?

Yes, I agree.

1 Like