Equality for priority queues

I want to get some feedback on Two equal priority queues with keys can compare as unequal · Issue #78 · lspitzner/pqueue · GitHub. MinPQueue k a is a priority queue where the elements are basically pairs of a priority k and a value a (and that supports extraction of the minimum element).

The current Eq instance checks if two pqueues produce the same elements when repeatedly extracting the minimum element, so two pqueues that produce the same elements, but in a different order, are considered to be not equal. Using this instance, insert does not respect equality, i.e. inserting the same element into two equal pqueues may produce pqueues that are not equal.

Another option is to consider two pqueues to be equal if they have the same elements regardless of the order. Doing this in a naive way is really slow though. A faster way is to convert the pqueues to lists (by repeatedly extracting the minimum element), sort them and then compare them. This however would require an additional Ord a constraint (before, there were only Ord k and Eq a constraints). This instance also has the problem that insert doesn’t respect equality. Additionally, not even extracting the minimum preserves equality.

A third option is to just consider structural equality, but I think that wouldn’t be very useful.

Another, more radical option would be to simply remove the Eq instance, for it being not obvious what the behaviour should be.

Which option would you expect and why?

I agree equality should be the same as Map a k or a -> Maybe k (the latter doesn’t have a Haskell Eq instance, but I mean the mathematical extensional equality).

1 Like

This seems like a really bad property! But why does it hold? I don’t get it. Can you give an example?

There is an example in the linked issue when you scroll down a bit: Two equal priority queues with keys can compare as unequal · Issue #78 · lspitzner/pqueue · GitHub

Having read a bit more of the discussion, I think this is a fairly unfortunate condition:

The priority queue abstraction per se says nothing about which entry you get when, and we don’t guarantee one in particular.

It seems to doom all hopes of a sensible equality.

We can’t just do it like Map, since that doesn’t allow duplicate keys, while MinPQueue does.

1 Like

The reason is that insert doesn’t make any guarantees about where exactly the new element will be inserted and priority queues that compare equal might have a different internal structure.

1 Like

Right, and therefore there are no guarantees about which element will be popped either, so I agree with you that equality is not a very helpful notion for these things.

If I understand the situation correctly the only issue is in (arguably) degenerate cases were there are multiple items with the same key that can be retrieved in arbitrary order right? So a sequence of extract min operations essentially returns a list of “groups”, each group consisting of a unique key together with a multiset of values (that all have this particular key). Defining an equality based on that seems perfectly reasonable to me. Since the multisets are “implemented” as lists you can indeed not hope for better than using some O(n_i^2) time quality test for a group of size n_i, but presumably these groups won’t be all that large anyway, so the total cost is O(n log n + \sum_{i=1}^k O(n_i^2)). In the case the groups are all of constant size that is just O(n log n). If you do have one big group then yes it would be O(n^2), but in that case the priority queue does not really seem to contribute much anyway.

(And if you do really care one could explicitly construct a queue where the values are of type MultiSet a).

My opinion is that the only notion of equality that’s both somewhat useful and (reasonably) well behaved is the multimap-like equality: that is,

instance (Ord k, Ord a) => Eq (MinPQueue k a) where
  (==) = (==) `on` (sort . toList)

In this regime, operations that extract a minimal element or that fold/traverse the queue must be viewed as nondeterministic relative to the equality—which I think is a very reasonable interpretation of the abstract datatype.

This is precisely the notion of equality that we use for a number of the property tests. Is it useful outside the context of property testing? I’m not sure, but every other option seems worse.

@konsumlamm has proposed a version that only requires Eq a, but its performance is pathological ($O(n^2)$) when keys occur many times.

I’m still puzzled why inserting an element e with key k and then retrieving an element with key k doesn’t always yield e. How does “non-determinism” arise here? I can’t think of an obvious implementation that would be “non-deterministic” in this fashion.

There can be multiple elements with key k. Also, in a priority queue, you don’t retrieve elements by their key, you can only retrieve the minimum (or maximum, depending on the type of priority queue) or iterate through all elements.

1 Like

There can be multiple elements with key k

Sure, I understand that.

in a priority queue, you don’t retrieve elements by their key, you can only retrieve the minimum

Fair enough.

But I still don’t understand why the behaviour is to always retrieve the most recently inserted, or least recently inserted, element with the minimal key. I can’t even conceive of an implementation that wouldn’t give this behaviour! (Maybe I am insufficiently imaginative.)

Yes, insufficient imagination. It’s deterministic in the sense that the exact same operations (starting with empty queues) will always give the same results, but it’s going to be quite difficult to predict the order in which two values associated with the same key may appear. Perhaps most obviously (given the structure of a binomial queue), the union operation will tend to interleave values from its arguments in an unpredictable way. But even extraction is pretty wild. For example,

 toAscList . fromList $ [(1,1), (1,2), (1,3), (2,1), (2,2), (2,3)]

produces

[(1,3),(1,2),(1,1),(2,1),(2,3),(2,2)]

That particular example behaves the way it does partly because of the way we optimize fromList, so here’s a simpler one:

 toAscList . foldl' (\acc (k,v) -> insert k v acc) empty $
  [(1,1), (1,2), (1,3), (1,4), (2,1), (2,2), (2,3), (2,4)]

yields

[(1,4),(1,3),(1,2),(1,1),(2,4),(2,1),(2,3),(2,2)]

Note: There are priority queue implementations that preserve insertion order. The one I know of on Hackage is stable-heap, which uses a variation on a pairing heap. Another way to do it would be to write a monoidal finger tree that annotates each subtree with its minimal key. But either of these pays for its order stability in code complexity and performance—maintaining insertion order isn’t free.

Thanks. In that case I’d be in favour of not providing ==. It’s very strange to me to have two equal queues, but extracting the minimal element yields different elements.

1 Like

Large groups can happen when confronting unusual inputs. A priority queue where the values are MultiSet a wouldn’t do you any good (unless you have the multisets to start off), because inserting a new entry with the same key doesn’t locate existing nodes with that key (which would be very expensive), but rather just tosses it on the heap.

The most realistic argument I personally have for including an Eq instance is to support property testing of code that uses MinPQueue. The alternative is to supply an independent equivalence function. That gets pretty inconvenient when MinPQueues are buried down in other structures, but it is a reasonable and certainly principled approach.

Ok I guess you are right that this particular priority queue implementation + multisets does not address that issue yet. However, I would then argue that if you really do care about having multiple copies of the same key while distinguishing the various elements maybe you should pick a priority queue implementation that can support key-lookups (for example a priority search tree). That seems a lesser evil to me than requiring an Ord a constraint instead of an Eq a constraint.

I’m having trouble understanding what your latest comment is suggesting. What do you think should be done, and why? Also, both you and @konsumlamm seem to think there’s something hideously wrong with requiring an Ord a constraint, even going so far as to suggest a quadratic algorithm. What makes that constraint so “evil” in your view?