This confusion appeared in the linked issue #15921: Data.List.maximumBy uses counter-intuitive ordering · Issues · Glasgow Haskell Compiler / GHC · GitLab
although I don’t agree waldmann there, maximumBy (compare 'on' snd) should work regardless if the laws of Ord need total ordering or just a preordering.
So, it’s correct that compare `on` snd doesn’t generally define a total ordering, given the Eq (a, b) instance likely to be in play (though it does if a is (), for example). But the documentation for maximumBy doesn’t (currently?) state that the comparison function provided needs to define a total ordering, nor should it. As long as the Ord (a, b) instance isn’t defined using compare `on` snd, no foul.
Oh, right, waldmann most certainly meant that compare 'on' snd equality is not the same as (Int,Int) equality. It seems I am alone with my confusion.
But if you asked me whether, given
(==) = \x y -> True,(<=) = \x y -> Truedefines a total order onInt(or on any other type), I’d have to say yes.
Is there a reason why should Haskell (==) play the role of math =? For me it is more logical for Haskell = to play the role of math =, and for Haskell (==) to play the role of math ==. For example in the case of Arg a b I would say that (==) = is an equivalence relation, and (<=) is a total preorder. That is my mathematical model, and it would be nice to use the same words, with the same meaning. Also nothing else cares about (==) on Arg a b and apart from ordering, everything else treats Arg 0 0 and Arg 0 1 differently, so it is weird to say that Arg 0 0 = Arg 0 1 in math, and antisymmetry holds.
edit: in summary, I think that that antisymmetry law is for (<=) to be a total preorder, and not a total order.
I think this is the crux of our disagreement, but there’s a lot of ambiguity smuggled into this statement. For me to go on this journey with you, I’d need you to define what you mean by ‘Haskell =’, ‘math =’, and ‘math ==’, the first two of which I think are overloaded concepts, and the last of which I haven’t seen without some prior definition.
This also might be better as a distinct thread if you want to continue the conversation; I don’t think we’d come back to the question of max, maximum, etc. from that direction.
I’ve been reflecting on this for a few days and I have to admit, it’s sounding more compelling the more I think about it. Maybe it’s best to consider all of these functions as using argument-order or structure-order as the tie-breaker if Ord-order doesn’t distinguish between elements, instead of thinking about ‘bias direction’ as I have been.
The reason for the different bias is simple: if you do (min x y, max x y) you know that the pair will contain either (x, y) or (y, x).
Maybe the reason is not spelt out in the report, but it was carefully designed that way.
Enough people have spoken up against changing max that I probably won’t continue to pursue it, but for my own curiosity (and for the documentation, if it makes sense to include it): to what end was it designed that way? Why is that an important or useful quality? If I knew that both min and max favored their left argument, and I wanted a pair that was either (x, y) or (y, x), I’d write (min x y, max y x), so it’s not like flipping max would prevent you from easily writing an expression with that property.
It would certainly work to flip the arguments to max. I find it slightly less esthetically pleasing.
But in either case the choice should be clearly documented.
Currently base does not “require min and max to return either of their arguments. The result is merely required to equal one of the arguments in terms of (==)”.
The Report is a document of historical interest only. But how does the documentation contradict the actual implementation? Each instance Ord can define its own.
Because, as you say, they’re dual. In my case, the stronger intuition is that a caveat like ‘in case of ties, the first argument is returned’ would apply consistently to both min and max. Expecting that, if I wanted a pair in sorted order, I would use min x y and max y x to construct that pair.
The intuitions from pure mathematics don’t hold much sway for me here because in pure mathematics, both min and max are symmetric functions and none of this matters. It’s equally true in pure mathematics that x == min x y <=> y == max y x, because in pure mathematics, max x y == max y x. If you favor one of these principles over the other, it can’t be because you’re drawing your intuition from pure mathematics; you’re getting that intuition from some other domain, where min and max are not symmetric.
There are many types for which semantical equality does not imply structural one. Arg is just one (and exotic) example, ByteString and Text are more common ones.
I guess I’ll never know for sure, because I only learned about max being right-biased in the context of investigating an inconsistency in the Foldable functions. But I was surprised to learn that max favored the opposite argument to min, and I think I would have been equally surprised to learn that in any other context. It just strikes me as an odd choice in the context of software engineering, where it’s generally preferred that similar things behave similarly. I think that maybe some of you all think that an intrinsic part of max-ness is reaching toward the right, where min-ness reaches toward the left—the argument bias is for some reason coupled to the ordering of the values being compared. I see it as an approximation of the ideal min and max—we can’t have symmetric min and max, so we have to compromise, and I would expect that compromise to be the same for both functions.
It’s all kind of moot; as I said, the support for changing this doesn’t seem consistent enough for me to pursue it, so I’m going to pivot toward making the documentation as clear as possible, and possibly campaign for making the Ord documentation stricter about what min and max should be allowed to return (without changing their default implementation).
Ah, as for that, I have these links:
Here David Feuer introduces the current wording seemingly off the top of his head:
And here’s the MR where the structural equality vs. Eq-equality question is discussed, but no mention of bias direction:
Given the decision to allow Ord instances to return things that are merely Eq-equal to either argument, the documentation is technically correct because there is no bias up to Eq-equality. The documentation merely suggests a left bias. So nothing really went wrong here. However, in order for maximum to be morally a special case of maximumBy, we really would need to make the requirements on max stronger than Eq-equality. I share your discomfort with the current documentation allowing implementations to return some sort of hybrid of the two arguments; that’s very much not what a function called max should do, in my opinion!
I don’t know the original motivation, but I know plenty of cases when min / max returning something different from their arguments is useful.
- Quite often the difference between semantical and structural equality is in some sort of annotation with a provenance / position. E. g.,
data Value = Value { getValue :: Int, getProvenance :: Position }
instance Eq Value where
(==) = (==) `on` getValue
Now what is the provenance of min x y :: Value? It might be sensible to set Position to <no location>, or maybe to a position of min itself, or maybe combine position ranges somehow. In all these cases min x y is structurally neither x nor y (but semantically equal to one of them).
- Another case is that maybe
min/maxdoes not evaluate anything, but just construct a correspondent expression. Say,
data Expr
= Const Int
| Add Expr Expr
| Mul Expr Expr
| Min Expr Expr
| Max Expr Expr
and then simply
min :: Expr -> Expr -> Expr
min = Min
- Further, if evaluation of
min/maxrequires some sort of normalisation of arguments, why would not we return the normalised value? For example, it makes a lot of sense formin (Add (Const 1) (Const 2)) (Const 5)to returnConst 3.
A variation of above is the following definition of min for ByteString:
min x y
| Data.ByteString.null x = mempty
| otherwise = ...
Even while x == mempty, it is not necessarily structurally equal to it (it might be a zero slice of a larger ByteString). And yet this is a sensible definition, useful to free up some over-retained data.
My issue with all three of these points is that they’re general arguments against ever having laws. Imagine, for example, extending Expr to include a primitive for semigroup append:
data Expr
= ...
| Append Expr Expr
and then
instance Semigroup Expr where
(<>) = Append
But this instance violates the associativity law of Semigroup. Is that an argument against having the law?
I don’t find this to be a problematic min. The details of how ByteString is implemented internally shouldn’t matter here, as long as the API for ByteString is explicit about which functions peek behind the curtain and which are respectful of the boundary. I’ve been using the phrase ‘observable equality’ rather than ‘structural equality’ to try to highlight this: the essence of the law I would propose is that for any ‘safe’ f :: a -> Bool, let z = f (min x y) in z == f x || z == f y is always true. Hopefully any functions that can distinguish two empty ByteStrings by looking at what memory regions they slice either are documented as unsafe or wrap their results in IO?
I don’t think so. The laws are just required to hold up to some observation function.
@tomjaguarpaw, I don’t understand the point you’re trying to make. Bodigrim’s argument was, it’s convenient not to require Ord instances to obey a stricter version of its laws, because that allows more instances. My counter is that it’s always convenient in that way not to require instances to obey laws, and yet we have laws. I have no idea what your statement means in that context.
Perhaps it would help if I elaborate. There’s a natural tension between having stronger laws (good because you can prove more) and having more instances (good becaues you can use the class in more situations). One way of having many of the benefits of both worlds is to require laws to hold only up to a fairly fine-grained observation function. I see Bodigrim’s points as general arguments in favour of laws holding up to fairly fine-grained observation functions. (Not that I necessarily agree with that, but I think that’s the strongest conclusion one can derive from his points.) However, you said
My issue with all three of these points is that they’re general arguments against ever having laws.
That seems like far too strong a conclusion to me! Perhaps it would help if you explained how you got there from Bodigrim’s points.
EDIT: Or to put it another way, my interpretation of what you’re saying is “You can’t believe Bodigrim’s points yet also believe in ever having laws”. Did I get that right? If so could you please explain? That’s not obvious to me!
Yes, exactly.
Such as (==), in the current iteration of the min/max laws? With you so far, I think, although I’m not aware of other classes making this sort of allowance. When the Semigroup law says that (<>) should be associative, that’s not up to some particular observation function, is it? I default to interpreting such laws as holding up to any pure, safe predicate—when the documentation says x <> (y <> z) = (x <> y) <> z, I read that as meaning that for any safe function f :: a -> Bool, f (x <> (y <> z)) == f ((x <> y) <> z) (where now == can be simple boolean equality). That’s the same standard I’d like to use with min and max.
Is there a practical difference between arguing for weaker laws (such as laws up to some observation function) and arguing for fewer laws? They seem like the same thing to me—a stronger law subsumes a weaker law by adding an additional restriction, which could be seen as a second law in its own right.
What strikes me as more important than the difference between weaker and fewer is the spectrum with strong laws and fewer instances on one end and few/no/weak laws and many instances on the other end.
Bodigrim says that:
That’s the argument for pushing toward the many-instances side of the spectrum—that it’d be useful. That’s what I see as the general argument against laws—more instances are always useful!
He then gives some specific examples of cases that he thinks are particularly useful for min and max. I draw a parallel case to one of those examples with Semigroup, and point out that this would be useful in the same scenario that his example would be useful—by Bodigrim’s reasoning, I would expect him to object to a strict interpretation of Semigroup’s laws in that scenario as well.
But I doubt that he does. Because ‘useful’ is not the deciding factor in determining what laws a class should have. Rather, what is most important is the intended meaning of the class abstraction. <> could represent some arbitrary binary operation—but we intend that the meaning of the Semigroup class is to represent, well, a semigroup. And to do that, <> must be associative.
Likewise, if min and max are to represent the abstract concepts of ‘take the lesser/greater of these two arguments’, it would be incorrect for them to (observably—under the hood I don’t care what they do) construct a third value. It might be ‘useful’, but it’s incorrect, and that’s that! If we let usefulness override our sense of what class operations are meant to represent, we would not have laws, because laws only get in the way of having more instances and being more useful.