Early feedback: left-biasing `max`

tl;dr: Looking for early feedback on making max and maximumBy left-biased; read this for more.

It is a little-known*, and possibly rarely-relevant, fact that the default implementation of max in the Ord class is right-biased, while the default implementation of min is left-biased. By this, I mean that if two terms of a type compare equal (according to (==)), applying default min to the terms will return the first and applying default max will return the second. (I don’t know the reason behind this choice, though the Haskell 2010 Language Report specifies it, as did Haskell 98.)

This convention—it’s only a convention, and an undocumented one at that—extends to the implementations of minimumBy and maximumBy (in both of the Data.Foldable and Data.Foldable1 modules), which can’t use the min and max functions from the relevant Ord instance and so need to assume their own bias direction. Consistent with default min and max, they are left- and right-biased, respectively.

It’s not often the case that two equal-up-to-(==) terms will be observably different from each other, so outside of performance-related concerns this bias direction doesn’t matter much. However, one type for which (==)-equality is intentionally not the same as observational equality is Data.Semigroup.Arg, which exists pretty much only for the purpose of finding a minimum or maximum over its first field and then extracting the second field, which doesn’t participate in the comparison. min and max for Arg are both declared with left-biased implementations, in what I assume was an intentional choice backed by a belief that, for the use case targeted by this data type (i.e., when the difference between (==)-equality and observational equality is relevant), a left bias is more useful than a right bias.

This state of affairs leads to minimumBy compare being practically equivalent to minimum, but maximumBy compare only being equivalent to maximum on types where the max member of Ord hasn’t been defined to be left-biased—not Arg, in other words.

In light of this, and bearing in mind that most similar functions in base are left-biased by convention, I’m putting forward a CLC proposal in its early stages to change the default implementation of max, and the implementations of maximumBy, to be left-biased; and to note this as the preferred bias direction in the Ord class documentation. This would be a wide-reaching change, and while I can hope that its practical impact on existing programs would be near-zero, it’s the sort of subtle change in behavior that could easily have all sorts of unexpected consequences.

If you might know of a situation that would be sensitive to this change, please weigh in. Start reading at this comment, please.

(Enthusiastic support is also welcome. :slight_smile:)

*: Wildly extrapolating from the fact that I didn’t know it until recently, and the reporter of this issue claims that all of their peers that they asked didn’t know it either.

7 Likes

I honestly find it way more surprising that

max (Arg 2 2) (Arg 2 3) /= max (Arg 2 3) (Arg 2 2)

but

compare (Arg 2 2) (Arg 2 3) == compare (Arg 2 3) (Arg 2 2) == EQ

Why is the preferred solution to change every other Ord instance to comport with this one rather than to document the very peculiar properties of Arg more loudly?

Are there other benefits? I’m sympathetic to the idea that it “ought to” be the case that maximumBy compare == maximum, but that’s by no means a law, and maximumBy doesn’t really know anything about Ord besides.

Is there an alternative where we can have an Arg that lets you pick the preferred side? You can do it dirty today with Flip, but there’s probably some silly way to use First and Last to specify what you mean.

1 Like

But this is False. Arg 2 2 == Arg 2 3; they just aren’t observationally identical.

That’s a good question. My thought is that if we’re going to tell users that Ord should behave a certain way, it’s less to expect them to remember to say that min and max and their associated functions are all left-biased than that min is left-biased and max is right-biased, particularly given how left biases crop up elsewhere in the prelude (I’m primarily thinking of (<|>), but @mixphix’s comment suggests there are others).

No matter what happens, the documentation should speak more clearly about the state of affairs that results. If changing max's bias is disruptive, I am personally okay with documenting the more complicated principle and any exceptions to it that we want to retain; but first I wanted to investigate if we actually have to settle for that.

Practically, I don’t know. Conceptually, I think of this as clarifying what an instance of Ord is allowed to express. The central spirit of Ord, I would argue, is compare (or (<=) if you prefer; either way what follows is the same). Everything else that is in Ord is a performance optimization. We have documented laws or expectations that require that the previous claim is true, for the comparison members. We have much weaker requirements about what min and max are permitted to do; they are only expected to return values equal to one of their arguments. That means that, de facto, an instance of Ord can currently express not just a compare method, but also the bias direction of minimum and maximum. Do those extra bits of information belong in Ord—do we want Ord instances to be free to vary in this way? Or do we want Ord to truly be just the behavior of compare plus performance optimizations? If the latter, we should specify the bias direction of min and max.

Bit of a digression, but I’m used to argmin and argmax from mathematical contexts where these things are assumed to be sets, not individual values. So what I’d expect to see filling this niche are:

argmax :: forall f cod dom. (Foldable1 f, Ord cod, Semigroup dom) => f (cod, dom) -> (cod, dom)
argmax = foldl1' max'
  where
  max' l@(a, b) r@(c, d) = case a `compare` c of
    LT -> r
    EQ -> (a, b <> d)
    GT -> l

argmin :: forall f cod dom. (Foldable1 f, Ord cod, Semigroup dom) => f (cod, dom) -> (cod, dom)
argmin = foldl1' min'
  where
  min' l@(a, b) r@(c, d) = case a `compare` c of
    LT -> l
    EQ -> (a, b <> d)
    GT -> r

and you’d fmap (or coerce, for the first two) the input into f (cod, First dom), f (cod, Last dom), or f (cod, [dom]) depending on the result you wanted.

1 Like

Sorry, yes, I was being cagey with syntax and meant observational rather than Eq equality.

The “left biases” I was thinking of were (&&) and (||).

I don’t know the official reason but it makes sense that even if a == b (which doesn’t mean they are actually identical), if min a b returns one, then max a b should return the other, so you can write sortTuple (a,b) = (min a b, max a b). If a == b the order doesn’t matter but I still want a and b not (a, a).

4 Likes

If we left-bias max, would it be a problem to implement that as sortTuple (a, b) = (min a b, max b a)?

I am playing advocate’s Devil here, I don’t need sortTuple, however it seems normal that min a b and max a b (in the same order) don’t return the same thing (if one is the min, then the other is the max).

the sort tuple example is nice!

1 Like

What I’m having trouble understanding is why one would expect (min a b, max a b) to reduce to (a, b) or (b, a), but one wouldn’t expect (min a b, max b a) to reduce to (a, b) or (b, a). Both seem equally justified to me, from normal mathematical intuition, but only one can be satisfied by a given implementation.

Whereas I think expecting that minimumBy and maximumBy should behave consistently with each other is quite reasonable, and so I think that should drive the question of which of the two equally-mathematically-justified implementations of sortTuple should be preferred.

i think the interval notation motivation is actually quite a nice one! On the other hand, I think a more substantive qualm is: will this change positively impact anyones code? Or will it result in possible changes to the time/space characteristics of lazy code out in the wild.

I do really like @maxigit 's adhoc notation example though :slight_smile:

1 Like

Basically, you are saying that opposite(?) of min should be flip max instead of simply max.

1 Like

Sort of? I think it’s more like there are two independent axes that can be reversed: the comparison and the bias. To make the (a, b) or (b, a) trick work, you need both axes inverted between the two sides of the pair. I’m proposing that the inverse of min on the comparison axis should be max, the inverse of min on the bias axis should be flip min, and the axes shouldn’t be coupled to each other.

But to make a vector algebra analogy, this is just one choice of basis; the basis of {flip bias, invert comparison and flip bias} spans the space just as well, and is more aligned with what we currently have. The reason I prefer the first scheme to this one is just that it seems simpler to think about in most cases, and more intuitive in its consequences for minimumBy and maximumBy.

I think we have to evaluate this proposal on presumed future gains, since it can’t improve any existing code. How do I know it won’t improve existing code? If this change can be implemented without breaking anybody’s code, then presumably the current behavior is fine. (But note, I doubt this is the case.)

But what future gains are to be had? I don’t want to misrepresent you, so perhaps you could clarify what you think they would be?

I have a couple other suggestions, as well.

To say, “I don’t know the reason behind this choice, [and I want to change it]” is a perfect example of Chesterton’s fence. If you could find out why the choice was made, you could make a much stronger argument for changing it by arguing against those original points.

Second, presuming that you do have a strong case for future gains that might come from aligning Arg with other types, could it be that the problem is with Arg itself? It sounds like Arg implements Ord in a way that is inconsistent with how most other instances are implemented in base, as well as how they are specified in the language reports. In other words, it sounds like a bug in Arg. So, should we instead fix the bug? (To be clear, I would argue against doing this as well, but let’s take one argument at a time…)

Finally, I do enthusiastically support updating the documentation! Ord should definitely be updated to point out the prescribed biases, and Arg should be updated to point out that it has strayed from the prescription. Although it’s a bit hard to find, here is the source for Ord, and here the source for Arg. (To be honest, the existing docs are too terse for me to properly understand: I can’t make out if they are trying (and failing?) to describe the bias at all.)

3 Likes

The motivation here is consistency and intuitiveness. My attention is primarily on maximum and maximumBy, and aligning their behavior with minimum and minimumBy; I’m a little surprised that the default binary min and max functions have the biases they do, but it’s more surprising to me when two parallel functions that operate on Foldable structures break ties by selecting from different ends of the structure. Per previous discussions elsewhere, I’m not alone in being surprised by this.

The more consistent that library functions are with each other, the easier it is to remember what they do. That’s the future gains, I suppose—making this small area of base easier to learn and use. Is that important enough to risk disrupting existing code? In the long run, the cost of a confusing API keeps growing but the cost of disrupting existing code is limited. (But sometimes constant factors are more important than asymptotics, yes.)

Once I went down the rabbit hole of why maximumBy and minimumBy have opposite biases, I learned about what’s going on with max and Arg. I’m only proposing changing those things to keep them consistent with what I want maximum and maximumBy to do, because again, consistency makes the library easier to keep in one’s head.

Of course. To be clear, anywhere in this thread or in the GitHub issue where I profess ignorance, it’s not meant to justify the change; it’s instead calling out an area where I tried to do the research and hit a dead end, and someone with more Haskell history than I have needs to help me out if the answer is important. I went spelunking through the Git history of GHC and read the two Haskell Language Reports available online to try to answer this question, and it appears to have been in place since before any of those artifacts. Someone who was involved with Haskell in the 90s might have to weigh in here, if conservatism demands a full accounting of the history of this choice.

If you read the proposal on GitHub, I outlined four possibilities in descending order of my personal preference (based on my incomplete knowledge of why things are as they are as well as my subjective intuition and taste):

My answer to all questions of the form, ‘Should we instead do something lower on the list?’ is going to be that I would prefer something higher on the list and I’m trying to determine if that is feasible first.

2 Likes

Can I vote for 2 then ?

It looks for me that point 7 and 8 in the Ord doc clearly says (yet in an obscure way) max should return x when min returns y and vice versa, then to be demolished by stating that any value can be returned if equality holds (allowing Arg to even create a new value by mixing both inputs).

Edit

I was refering to

which is right biased but I can’t find anywhere … even though it agrees with haskell2010

 -- Note that (min x y, max x y) = (x,y) or (y,x)  
    max x y | x <= y    =  y  
            | otherwise =  x  
    min x y | x <= y    =  x  
            | otherwise =  y

Right, the caveat below these lines makes it clear that this is only up to equality. But if you miss that caveat, don’t you think that these lines suggest that both min and max are left-biased? In both rules, if x == y then the result is expected to be equal to x.

This is a bit of a tangent; I don’t think anyone is likely to take the position that the docs don’t need clarification about all this, so anything they currently suggest is only relevant insofar as it supports or casts doubt on the idea that it’s generally known among Haskellers how these biases actually work.

1 Like

Where did you get those 7 and 8 ??? They are left biased and don’t correpond to the ones I’ve seen on the Ord doc.

Straight out of Hackage; are you possibly looking at docs for an older base version?

1 Like