How to double iterator in FP style: pairing list into (n, n + k)

(What I don’t like about my via Data.Set approach is it needs building the set then deleting elements, to ensure there’s no duplicates in the result. Deleting is computationally expensive, in terms of rebalancing the tree.)

So here’s a more Listiferous approach, along the lines of the o.p. “secondary index”: scan down the input, maintaining a parallel/secondary list with wanted pairs. Assuming the input is strictly ascending, the wanteds will also be strictly ascending, so we only go tailward along them/there’s no destructive update.

pllPairs :: (Num a, Ord a) => a -> [a] -> [(a,a)]
pllPairs k xs = go xs []
    where 
          go []         _                   = []
          go (x:xs)     []                  = go xs [(x, x+k)]  -- want x+k in xs
          go xxs@(x:xs) ( yyk@(_, yk) : ys) = 
            case compare x yk of
              EQ -> yyk : go xs ys                              -- matched a wanted
              LT -> go xs  (yyk: ordInsert (x, x+k) ys)         -- yyk not yet matched
              GT -> go xxs ys                                   -- yyk not matched, drop it
          
          ordInsert :: Ord a => (a,a) -> [(a,a)] -> [(a,a)]
          ordInsert yyk         []                    = [yyk]
          ordInsert yyk@(_, yk) zzs@(zzk@(_, zk): zs) = 
            case compare yk zk of
              EQ -> error "duplicates in List"
              LT -> yyk: zzs
              GT -> zzk: ordInsert yyk zs

This has some (crude) protection against deviant lists: it copes with Lists that don’t wholly pair up; but is fairly erratic faced with non-ascending or non-unique elements.

Interesting problem!

My solve… took a few minutes of head scratching so I tried to capture my thought process along the way and how that translates into a functional algorithm. Tacked on some ideas for generalizing the operation.

Rough intuition:

  • In any collection of integers, there is a minimum value, ‘n’
  • Given the rule of matching (n, n+k), there can only be one pairing available for the minimum value
  • Thus, this pair must be an output pair - generate it and hold for confirmation
  • Repeatedly pull the next-minimum value ‘m’, which must be at least as large as ‘n’
  • If m matches n+k, the pair is confirmed
  • If m exceeds n+k, then no future m could match, so the pair is invalid and the list cannot be paired completely, do whatever is appropriate in this case
  • Otherwise, m must be a new pair (m,m+k), which is added to the end of the list of confirmation pairs

Declaratively:

  • Translating to a declarative algorithm using a Zipper:
    • Zipper is a single-pass, expected O(1)-work per element
    • Zipper holds the source list, the output list, and a filter list (and error list if desired)
    • Extract minimum element from the source list
    • If the minimum element matches the partner of the lowest item in the filter list
      • Pair is confirmed, move (n,n+k) to output list
    • If the minimum element is larger than the partner of the lowest item in the filter list
      • The list cannot be paired, handle error (e.g. move to error list)
    • Otherwise add (n,n+k) to the end of the filter list
      • This is the only non-O(1) step, as the filter list may be arbitrarily long for large values of k

Generalizations:

  • (+) - Plus is just one possibility, can this algorithm be generalized over the operator?
  • (Int) - Many useful properties of integers don’t apply to all types, might be interesting to explore using the polymorphic type [a] as input
    • Zipper structure remains the same, but the algorithm requires (Ord a)
    • Is there a single-pass algorithm that can use (Eq a) or is (Ord a) requirement?
  • Ambiguous - Some lists are ambiguous, such as [1,4,7] since there are two partial answers ([1,4],7) and (1,[4,7])
    • What different kinds of results might a client want in these cases?
  • Streaming - zippers work well for processing streams
    • What are the requirements on a stream for this algorithm to work?
    • Is there an algorithm that works for streams that does not have those requirements?
import Data.List

-- Zipper [Errors] [Output] [Filtering] [Source]
data Zipper = Zipper [Int] [(Int,Int)] [(Int,Int)] [Int] deriving Show

step :: Int -> Zipper -> Zipper
step k z@(Zipper _ _ _ []) = z
step k (Zipper ers ys [] (x:xs)) = Zipper ers ys [(x,x+k)] xs
step k (Zipper ers ys fs@(pr@(m,n) : prs) (x:xs))
    |   x == n  = Zipper   ers  (pr:ys)     prs         xs
    |   x >  n  = Zipper (m:ers)   ys  (prs++[(x,x+k)]) xs -- Error case; add unpaired value to error list
    | otherwise = Zipper   ers     ys   (fs++[(x,x+k)]) xs

walkList k z@(Zipper e o f []) = Zipper (map (fst) f ++ e) o [] []  -- Add any remaining unpaired values to the error list
walkList k z = walkList k $ step k z

solve k ls = let (Zipper ers out fls src) = walkList k $ Zipper [] [] [] (sort ls) in (ers, out)

Tests:

solve 3 [1, 3, 4, 6, 7, 10, 12, 15]

==> ([],[(12,15),(7,10),(3,6),(1,4)])

solve 3 [1,4,7]

==> ([7],[(1,4)])

solve 3 [4,1,4,7]

==> ([],[(4,7),(1,4)])

solve 3 [1,5,6,7,8,11,15]

==> ([11,15,7,6,1],[(5,8)])

Thanks Bryan, your approach is a fancier version of my most recent solution, which is in turn a fancier version of @jaror’s ‘second stab’.

We don’t usually reckon ‘work-per-element’ because it’s always O(1). Perhaps you mean O(n)? That is, the work is proportional the number of elements? But it’s worse than O(n), because your x > n and otherwise (x < n) cases use (++) to maintain the [Filtering] – as do those earlier two versions. So the work of append depends on the length of the tail, making it O(n + 1/n). Also your solve starts with (sort ls) – which isn’t strictly necessary by the initial assumptions, but is O(log n) assuming a random initial sequence. If you’re going to sort, you might as well load to a Data.Set; then you can in effect filter by binary-chop lookup, see my earlier solution.

Sorry, I should have been more specific - the expected-O(1) is for the zipper step function. As you note, it concatenates elements to the filter list, so “expected” here covers the non-O(1) behavior under an assumption that k << n (and other non-degenerate case assumptions).

Of course using concatenation is idiomatic and could be replaced with an amortized-O(1) queue to eliminate that bottleneck (eg. Efficient Amortised and Real-Time Queues in Haskell - Well-Typed: The Haskell Consultants).

Data.Set was my initial thought as well until I saw one of your test cases containing a duplicate value: [4,1,4,7]. Since the OP did not link the original question’s source to verify the requirements, I went ahead with the assumption that duplicate values would be allowed, so Set would not work, but sort does.

I frequently analyze the suitability of algorithms for distributed use-cases. Here we look for the scope of resources used in processing each element individually - is the processing local or does it require access to some global data or partially-computed results, etc. This will inform the batch size sent to each worker, the shape of the fan-out and fan-in graphs, and whether the workers must be horizontally or vertically scaled.

In this case the zipper state machine shows us that (after sorting or otherwise extracting the lowest element), the per-element processing is purely local with expected constant time and space. So this algorithm is suitable for domains such as stream-processing and analytics. I can imagine a looser variant of this algorithm being used for pairing together players in an online game, for example.

While this algorithm is not suitable for horizontal scaling using divide-and-conquer (since each division can create an unpaired element in an otherwise completely paired list), it is interesting to note that it can be run from both ends of a single list without introducing additional unpaired elements.

Great approaches, I greatly appreciate it!
For what it’s worth, it was a subproblem of this problem:

Find Array Given Subset Sums

You are given an integer n representing the length of an unknown array that you are trying to recover. You are also given an array sums containing the values of all 2n subset sums of the unknown array (in no particular order).

Return the array ans of length n representing the unknown array. If multiple answers exist, return any of them.

An array sub is a subset of an array arr if sub can be obtained from arr by deleting some (possibly zero or all) elements of arr. The sum of the elements in sub is one possible subset sum of arr. The sum of an empty array is considered to be 0.

The test cases are generated to have one correct answer, but I think it wouldn’t be hard to error out if the constraint is not met. On the other hand, any answer that fits the bill is fine.

Example test cases:

Input: n = 3, sums = [-3,-2,-1,0,0,1,2,3]
Output: [1,2,-3]
Input: n = 2, sums = [0,0,0,0]
Output: [0,0]
Input: n = 4, sums = [0,0,5,5,4,-1,4,9,9,-1,4,3,4,8,3,8]
Output: [0,-1,4,5]

Could you produce a nice code for the entire problem?
I’ve been personally struggling to achieve that.
And no, it is not a homework problem, I was curious how one would solve this in haskell.

1 Like

Try something like this

import Data.List as List ((\\), intersect, nub, sort)
import Data.Foldable (asum)

bruteforce 0 [] _ cur = Just cur
bruteforce 0 arr _ cur = Nothing
bruteforce n arr sums cur =
  let 
    sub x = 
      let
        nextsums = fmap ((+) x) sums
      in
      if intersect nextsums arr == nextsums
        then bruteforce (n - 1) (arr \\ nextsums) (sums ++ nextsums) (x:cur)
        else Nothing
  in
  asum $ map sub (nub arr)

unsum :: Int -> [Int] -> Maybe [Int]
unsum n arr =
  bruteforce n (sort arr \\ [0]) [0] []
    
main = 
  case unsum 4 [0,0,5,5,4,-1,4,9,9,-1,4,3,4,8,3,8] of
    Nothing -> putStrLn "Failed"
    Just arr -> putStrLn $ show arr