Pairs ordered by largest sum

I ran into this problem whilst solving a puzzle, even though it isn’t the puzzle itself. Say you have some arbitrary set of numbers [1,2,3,4,5] and you want to consider any pair of two numbers in order of biggest sum first. I.e. (5,4), (5,3), (5,2), (4,3) and so on.

You can sort the original set – that’s ok – but > O(n^2) solutions like generating all pairs first and then sort them is intractable due to the size of the list.

I need a Haskell snippet which will lazily generate all such numbers given a list, in order of largest sum first. I’m working on it at the moment, but I thought it sufficiently interesting to share.

Any thoughts?

2 Likes

Can you be a bit more specific? E.g.:

That seems like a list? Do you mean set or list? Can it have duplicates? Is it ordered?

This seems to exclude for example (5,5). Is that on purpose?

It seems impossible to me to lazily generate an ordered list given an unordered list as input.

1 Like

I mean a list.

You can only use each entry from the list once in any given pair.

You can sort the original list (mentioned in original post).

Thanks for answering my questions.

Here’s how I would do it:

import Data.Function

-- assumes the input is sorted from large to small
-- e.g.
-- >>> f [5,4,3,2,1]
-- [(5,4),(5,3),(5,2),(4,3),(5,1),(4,2),(4,1),(3,2),(3,1),(2,1)]
f :: [Int] -> [(Int,Int)]
f [] = []
f (x:xs) = mergeBy (compare `on` uncurry (+)) (map (x,) xs) (f xs)

-- assumes the first element of `xs` is larger than the first element of `ys`
mergeBy :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
mergeBy _ [] ys = ys
mergeBy f (x:xs) ys = x : go xs ys where
  go (x:xs) (y:ys) =
    case f x y of
      LT -> y : go (x:xs) ys
      _  -> x : go xs (y:ys)
  go [] ys = ys
  go xs [] = xs
3 Likes

Goes a bit nuts for this input:

f [5,5,4,3,2,1]

I’m generally surprised not to have seen this as a puzzle problem before – I think its quite a good one.

Your example also excludes (3, 5), which should appear before (5, 2). Is that intentional—do you mean to produce a list of unordered pairs?

Yes, unordered pairs, sorted by greatest sum.

Ah, I guess you only want unique pairs in the output. Then you should be able to simply insert a dropWhile:

f (x:xs) =
  mergeBy 
    (compare `on` uncurry (+))
    (map (x,) xs)
    (f (dropWhile (== x) xs))

Such beautiful code sir.

But I just noticed it is not correct…

ghci> f [5,5,5,4,3,2,1]
[(5,5),(5,5),(5,4),(5,3),(5,2),(4,3),(5,1),(4,2),(4,1),(3,2),(3,1),(2,1)]

The fix will be slightly more complicated.

any chance you could make your code work on the basis of order rather than values? I.e. if a number has a higher index than x then it must be >=x. It’d make it more useful, then it could be used for anything in descending order.

Uses library functions, because that’s what they’re there for:

import Data.Bifunctor
import Data.Coerce
import Data.Function
import Data.List.Extra
import Data.Ord

unorderedPairsBySum :: (Num a, Ord a) => [a] -> [(a, a)]
unorderedPairsBySum xs =
  foldr1 (mergeBy (compare `on` uncurry (+))) $
    sequence <$> xs' `zip` tail (tails xs')
  where
  xs' = nubSort xs

unorderedPairsBySumDesc :: forall a. (Num a, Ord a) => [a] -> [(a, a)]
unorderedPairsBySumDesc = coerce (unorderedPairsBySum @(Down a))
1 Like

I don’t think you actually want this. On the input [10, 3, 2, 1], how is the function to know to emit (10, 1) before (3, 2) if it doesn’t actually compute the sum of the values?

Yeah, you’re totally right.

I believe that mergeBy isn’t lazy enough. Mine is lazy in the second list. (And nubSort is certainly not lazy.)

the actual list i’m working with has about 600,000 elements – so laziness really matters in this case, otherwise it just won’t work.

Looks lazy to me; all recursive calls are beneath a list constructor.

Edit: Oh, I think I see what you mean now. With my code, getting the first element out forces drawing an element from each inner mergeBy, but your code doesn’t force later mergeBys because it assumes that the first element of a mergeBy will always come from the left argument. That’s smart! (Though asymptotically it only matters if the nubSort can be replaced with nub.) Your inner go is basically (modulo ascending/descending order) just Data.List.Extra's mergeBy though, right?

nubSort is definitely not lazy, but that only operates on the input and the problem statement said that sorting the input is acceptable. If the input comes already in order then you can replace it with nub, which is lazy.

While this is a step in the right directions, it is not ideal because it will take O(n^3) to yield the full result. This is because mergeBy is just too expensive. Rough analysis: Consider that the input is [n,n-1...1] and we are merging xs and ys in go. ys is a list of size O(n^2) going from value (sum of the pair) 2n-3 down to 3. The minimum value in xs is n+1. To finish merging these two lists, we must traverse ys until we reach a value n+1, which would be around halfway. Probably someone can prove this more rigorously. So, merging n xss gets us O(n^3).

Ideally, we would not take more time than generating all pairs and sorting them, which is O(n^2 log n). Here’s one way to do it: instead of recursively merging the n sorted lists, put them in a heap! This yields every element of the result in O(log n) time, for a total of O(n^2 log n).

import Data.List (sortBy, tails, unfoldr)
import Data.Function (on)
import Data.Ord (Down(..), comparing)
import Data.List.NonEmpty (NonEmpty(..))
import qualified Data.List.NonEmpty as NE
import qualified Data.PQueue.Max as PQ  -- from pqueue

newtype DescList a = DescList { unDescList :: NonEmpty (a,a) }

instance (Eq a, Num a) => Eq (DescList a) where
  (==) = (==) `on` (uncurry (+) . NE.head . unDescList)

instance (Ord a, Num a) => Ord (DescList a) where
  compare = comparing (uncurry (+) . NE.head . unDescList)

pairsSortedBySum :: (Num a, Ord a) => [a] -> [(a,a)]
pairsSortedBySum =
  unfoldr (fmap f . PQ.maxView) .
  PQ.fromDescList .
  descLists .
  sortBy (comparing Down)  -- unnecessary if already sorted
  where
    descLists xs = [DescList (fmap (x,) (y :| xs)) | x:y:xs <- tails xs]
    f (DescList xs, pq) = case NE.uncons xs of
      (x, Nothing)  -> (x, pq)
      (x, Just xs') -> (x, PQ.insert (DescList xs') pq)
2 Likes

Empirically:

λ> length $ reverse [1..100]
100
(0.00 secs, 82,760 bytes)

λ> length $ f $ reverse [1..100]
4950
(0.03 secs, 46,665,936 bytes)

λ> length $ sortOn (uncurry (+)) [(i,j)|i<-[1..100],j<-[1..100],i>j]
4950
(0.01 secs, 6,674,208 bytes) 

So, like you said @meooow, it is much slower than just generating all the pairs and then sorting naively, for the full list. However, it is not without its advantages. Namely, even for very large numbers, you start to get the pairs in largest sum orders within O(N) cycles, whilst naively you’d have to wait for the whole list to be sorted first. It’s also very compact code. Here is a more neophyte friendly (less general) implementation of @juror code:

sortedPairs [] = []
sortedPairs (x:xs) = merge (map (x,) xs) (sortedPairs xs)
  where
    merge xs [] = xs
    merge [] ys = ys
    merge (x@(a,b):xs) (y@(c,d):ys)
      | a+b >= c+d = x : merge xs (y:ys)
      | otherwise  = y : merge (x:xs) ys

EDIT: Any method which produces all the pairs in the end will be at least O(N^2). If we are doing some sort of sorting, presuming we are not beating the best general case of O(N log N), so I would expect the minimal complexity for this algorithm to be O(N^2 log N^2) = O(N^2 * 2 log N).

1 Like

What order do the items go into the heap in? I.e. what would assure that the largest sum order is always respected other than putting in all the pairs before starting to get results?