That’s a really cool trick and really well written. However, I am wondering if we really traverse things once and what are the benefits ?
As far as I understand, we are kind of computing two independant thing in one go, the one the left (the tree) and the one on the right the min
. The circular bit being that we use the right thing in the left one, but not the other way round, so even though it looks circular the right bit can be (and will be thanks to laziness) computed whilst ignoring the the left side, which is similar in effect to the straight forward solution, compute the min , replicate the structure.
The advantage I can see is that the pattern matching done on each node will be done only one, thus “one traversale”, but also have to pattern match on tuples now. We are traversing once but pattern match twices, so have we actually gained anything ?
One advantage I can see, is we wrote the traversal code only once, which guarantee that the final structure matches the initial one. The drawback is the code seems a bit more complicated.
This is remind me of a common problem which is equivalent to given [1,2,5,10]
returns [(1,10), (2,20), (25,250), (10,100)]
.
When I started writing Haskell (20 years ago), I liked list comprension lots (I still do) so I would write
usingComp n xs = [(x, x*n) | x <- xs ]
I could also use map
usingMap n = map (\x -> (x, x*n))
(Which is pretty much the same)
Nowaday, I probaby use zip
and do
usingZip n xs =
let ns = map (*n) xs
in zip xs ns
Because, in a way, it describe best what I am doing, and I am not mixing the two operations.
Initially I didn’t like it because it felt I was giving more work to the machine( I use to prefer to do two things in a loop instead of two loops). But thanks to lazyness, I doesn’t change much. When I write that nothing happend. When I need to access the value, some pattern matching will happen (in different order) and what needs to be will be computed (maybe in different order).
What I still don’t like is that my structure might differs, but I know they don’t and learn to leave with it.
The similarity with the repmin problem is that we usingComp
traverse the list once whereas usingZip
traverse it twice, but does it ?
In both case we can compute the result once, which make the code clearer but we mix the result and the computation for performance (?) reason which we are not sure of .
That’s a good question. I mostly wrote this because I found it cleaner this way, and the performance answer is probably “it depends on what you’re trying to do”.
For this blogpost, it probably doesn’t matter – the layout algorithm’s performance is completely insignificant compared to the time needed to read and
write images. We do traverse things only once, but as you noticed there is some (insignificant) overhead in returning tuples.
In more hypothetical scenarios:
-
You could envision needing to do more traversals than just two. The technique isn’t really limited at two, and depending on your datatype may yield a speedup once you hit N traversals.
-
Maybe there’s a way to combine it with other lazy patterns so that you can
avoid traversing large parts of your datastructure. -
I suspect it could be combined with
foldr/build
fusion in certain cases to avoid building the entire tree…
I think it probably helps more with writing clean code than performance. If the two traversals share computations or patterns, this can help avoid repetition.
For the collage function I have, writing a two pass is sort of tricky since the equivalent of findmin
is something like:
measure
:: Sized a
=> Collage a -> Size
But we’d need to call this traversal at every step of the recursion in collage
, so we end up repeating a lot of work!
In order to avoid the extra work, we can store the Size
info in all nodes, not just leaves:
data Collage i a
= S i a
| H i (Collage i a)
| V i (Collage i a)
And then we could have:
measure
:: Sized a
=> Collage i a
-> Collage (i, Size) a
I don’t like that it makes the datatype a more complex though… maybe I would even introduce a second datatype? I think it becomes clear at that point that the single pass can be cleaner…
Storing the size in the datatype is the “naive” approach and unless I am wrong as indeed problem.
A sub collage start with a size of 1 and get resized each time it is added to its parent, which looks bad indeed. It even looks like we are traversing the graph n^2 times …
The lazyLayout
way, doesn’t seem to have this issue but end up internally with a chain of thunks of (ltrans <> (ltrans <> ...)
, which at the end might be equivalent : we are traversing once to build up a chain of thunks which will be “traversed” too on evaluation.
Having said that, I think I misread the initial post and realized that calculating the size upfront is not that straigt forward as it depends on the size of the children which also depend on the side of the parent.
I think lazyLayou
is indeed a really clever way of doing it. I’m just trying to understand how it differs under the hood from a more naive way (which is lazy too)