The pattern works well but it doesn’t really express what we want. We don’t care to actually make a list; we just want to iterate over the increasing sequence of numbers. In practice GHC probably optimizes all this away to something much nicer, but it’s risky to rely on it: if [0 .. 1000 * 1000] occurs twice in our program then GHC may float it to the top level and materialise it. That’s a space leak!
Maybe we should have a Range type instead? For example, see below. I’m actually surprised I’ve never seen anything like this before. Of course, it can be adjusted to incorporate different step sizes too (including negative step sizes). What do folks think of it?
import Data.Foldable
data Range a where
MkRange :: (Enum a, Ord a) => !a -> !a -> Range a
instance Foldable Range where
foldMap f (MkRange i j) =
if i > j
then mempty
else f i <> foldMap f (MkRange (succ i) j)
example =
for_ (MkRange @Int 0 4) $ \i ->
print i
-- ghci> example
-- 0
-- 1
-- 2
-- 3
-- 4
The obvious advantage of lists is that its one less data type to keep in mind and learn about.
Yes, I suppose so.
I think the best way to avoid the leak is just to teach GHC not to float out in these cases. And I think GHC already does not do that.
Could be. But sometimes maybe you do want to float lists out, when you expect them to be fully materialized. How would GHC distinguish those cases? In any case, it’s quite hard to default -ffull-laziness.
The streaming package (and conduit and pipes) uses a fundamentally different structure which indeed has the same problems as lists. They basically use a normal nested ADT, but the Stream I pointed to is a combination of a strict state and a step function. That step function prevents sharing.
Isn’t the [0..4] notation just syntactic sugar for enumFromTo (and also enumFromThenTo in the [0,2..4] case) and those methods are pretty much always lazy, so basically just a generator. Just like how the Foldable Range instance does it (without implying Monoid; using (:) instead of the MkRange constructor; and the rest of the list being a thunk instead of just a number like in Range)
So I don’t really know what the Range type/instance is trying to fix/improve?
EDIT: I reread a bit and found
[…], if [0 .. 1000 * 1000] occurs twice in our program then GHC may float it to the top level and materialise it. That’s a space leak!
Sorry, I didn’t click through. Still, your approach is also putting functions in the Range data type (in the form of the Enum type class). So I don’t expect that to be optimally performant either unless GHC can somehow optimize that away.
I have used a type like this before for work projects, inspired by Rust’s range support. I do dislike having to trust GHC to optimise things away that can be expressed directly very easily; at best it’s just adding to my already long compile-times. Relying on both stream-fusion and aggressive inlining to optimise away nested loops like forM_ [0..n] \x -> forM_ [0..m] \y -> _ just hasn’t worked that well in my experience, though it’s been a while since I last tried.
For Range you could use a pull array like data Range a = Range { length :: Int#, index :: Int# -> a } and get pretty good code regardless of how inlining/optimisation turns out. (It’s too bad using Int# and co is so cumbersome though.)
Looking at some GHC Core now. So to be fair to GHC (9.6.4), actually it does turn my original example into a nice loop. Using liftA2 unfortunately does seem to make it have to instantiate a list to match on. In any case I just don’t think the tradeoff is worth it, it’s not hard to make a simple range type or to write loops the old fashioned way . I think Rust has it right in making ranges their own type.
But I would have thought [first, step .. last] where first, step and last were all constants would be an exceedingly-rare scenario to worry about - at least one would be a parameter in the vast majority of cases…
…and one where said imperativity can also be encapsulated, if externally visible side-effects aren’t needed:
And then I assume you will point anyone who encounters this incredibly particular problem to that Range type instead of any other solution?
This is not going to change any time soon. Using unboxed datatypes in normal code is tying yourself to GHC internals, which will happily change without notifying you because PVP does not apply.
@mixphix’s answer is ultimately the correct one: the language has the tools for writing imperative code, you don’t need to rely on optimizations here.
Now that I think of it you can just write a custom loop:
loop :: Monad m => Int -> (Int -> m ()) -> m ()
loop m work = go 1
where
go n
| n > m = pure ()
| otherwise = do
work n
go (n + 1)
It’s trivial to test and it can be customized for the specific case that needs it.
The imperative approach with mutable state becomes a good idea when there are many variables that are updated sparsely, this case doesn’t even need it.