Why I like guards and multi-way ifs

This may be off-topic to the original question, but I want to defend guards and multiway-if. My use case is weight-balanced BSTs, e.g., https://yoichihirai.com/bst.pdf. The branching structure goes like

if | 3*a < b ->
       if | c < 2*d -> single rotation right
          | otherwise -> double rotation right
   | a > 3*b ->
       if | e < 2*f -> single rotation left
          | otherwise -> double rotation left
   | otherwise -> no rotation

or the guard equivalent, or the if-then-else equivalent.

That cannot be boolean blindness. You will have to accuse me of numerical blindness: How dare I use integers instead of sticking to algebraic data types!

1 Like

Thanks, yes it is off-topic. I wasn’t commenting on Boolean guards; but on burying bindings from Algebraic DTs.

I think the blindness in your example is I have no idea where those variables are coming from – perhaps they’re fields in a larger data structure or tuple? Then I’d rather use the whole structure as the disciminant (for example if these are co-ordinates for a couple of objects in motion). And I have to say you’ll defeat the coverage/incompleteness checker. (I had to read the code quite hard to see how).

Is this from a genuine requirement? What does the range (b/3) <= a < (3*b) represent?

Note that there will always be control-flow that the coverage checker will be unable to prove total because it depends on subtle data structure invariants such as in the OP. IMO it is perfectly idiomatic to just use Multiway if in the example of the OP.

Please consult again for example the linked paper on weight-balanced binary search trees. But if you really don’t want to, we have

data T k = Nil | Node{left, right :: T a, key :: k, weight :: Int}

then a = weight of right subtree, b = weight of left subtree, c = weight of right subtree of left child, d = weight of left subtree of left child, etc etc

weight means 1 + number of nodes.

I would not replace those numerical checks (and caching of weights) by re-traversing the whole tree over and over again.

Thank you for making me look in more detail. (BTW that paper is not using the data decl you give.) Then I’m quite sure the form you gave is unhelpful:

So:

  • c is coming from a field in a data structure;
  • c is only meaningful when that data structure has a particular substructure;
  • To obtain c, the code would need to walk the tree to validate the substructure.

All that walking/validation can be expressed immediately:

... = go node
  where
    go Node left@(Node leftofleft@(Node _ _ _ d) rightofleft@(Node _ _ _ c) _ b) 
            right@(Node _ _ _ a)
            k
            w
          | ... = ...

So we get structural guarantees to prevent mis-using c. And other equations for go would enable the coverage checker to see if we’d missed any possible structures.

I’d prefer to use long_and_meaningful_names for the weights rather than easily confusable a, b, c, ... – see the code style at the link provided by @sgraf on the previous thread.

I note with approval [**] this example code in the paper

insert kx (Bin sz ky l r) = case compare kx ky of
    LT-> balanceR ky (insert kx l) r
    GT-> balanceL ky l (insert kx r)
    EQ-> Bin sz kx l r

[**] To explain what’s good there: o.p. example has three cases at top level:

  • a much less than b
  • a much greater than b
  • a, b fairly close (then do nothing)

So the code should call a comparison function, returning an Ordering as the scrutinee. Part of the reason for a dedicated comparison is that, as the paper points out, using Int in arithmetic (3*a) risks fixed-width overflow. For most purposes (including holding weight in the Nodes), Int is fine. It’s only in the comparison there’s need for arithmetic, so (say) converting to Integer can be localised there. Interestingly, the paper suggests converting to Float, I guess partly because an approximation 3*a < b is good enough, partly Float comparison is more performant than Integer.