How to double iterator in FP style: pairing list into (n, n + k)

Yep, [1, 4, 7, 10] is definitely one of the possible inputs.

Here’s my second stab:

f k = go [] where
  go _ [] = []
  go [] (x:xs) = go [x] xs
  go ys@(y:ys') (x:xs)
    | x == y + k = (y, x) : go ys' xs
    | otherwise = go (ys ++ [x]) xs

You can speed this up by using a Data.Sequence for ys and especially that ys ++ [x] part.

Edit: (this assumes that the input is sorted)

Oh. I roughly thought of this way, but I was not acknowledging it as a FP solution. It looked too imperative to me. (Observe how y here acts like a state, I think it is never duplicated)
However, now I see that this counts as a FP solution.

I totally agree that this isn’t really very declarative. It is definitely still functional, but I also would prefer a higher level solution.

1 Like

Any idea on a nice declarative solution?

Here’s a higher level solution, but it is a lot slower:

import Data.List (nubBy)
import Data.Function (on)

tupleToList (x, y) = [x, y]

f k xs = withoutDuplicates [(x,y) | x <- xs, y <- xs, y == x + k]
  where
    withoutDuplicates = nubBy ((\xs ys -> hasDuplicates (xs ++ ys)) `on` tupleToList)
    hasDuplicates xs = nub xs /= xs

I’d say this is almost a direct formalization of your “pairing a list into (n, n + k) without duplicates” description.

2 Likes

Personally I think that non-mutating “imperative” solutions are perfectly functional. I like using Haskell as a better imperative language.

3 Likes

Even after the edit to k = 3, I don’t think the problem is well-defined:

Why does the solution prefer (3, 6) in the answer, as opposed to rejecting that in favour of (6, 9) to avoid duplicate 6 somewhere?

Is there something about preferring the lower of a clashing pair? Is there something about expecting the given list to be in ascending sequence, and preferring pairs generated from elements more to the left?

1 Like

Well, 9 is not in the input list, so that is not available for pairing. And in general I think it is impossible for an ambiguity to occur due to this. E.g. if you have [1,4,7,10] and k = 3 you are forced to return [(1,4),(7,10)], because there is no solution that has the element (4,7). And the input [1,4,7] does not satisfy the precondition.

Maybe the piece of info you’re missing is that every element in the input must be used exactly once and no extra numbers may be added?

1 Like

I hate making assumptions about the validity of input data. Especially input in Lists (which are not sets, and are not guaranteed ascending): really it’s a shame there are so many beginners’ exercise using only Lists; Haskell is no longer Lisp with bells on; early in an intro you should be introduced to more appropriate data structures (Data.Set would be my go-to here).

However, I’ve assumed for the exercise:

  • the input list is not necessarily ascending;
  • it is not necessarily a set (duplicate elements allowed);
  • it is not necessarily possible to pair up all the elements exhaustively

See the deviant tests.

inList = [1, 3, 4, 6, 7, 10, 12, 15]

rawPairs k xs@(_: ys@(_:_)) = [(x, y) | x <- xs, y <- ys, x+k == y]

filterDups _ [] = []
filterDups xy@(_,y) xys@(xy'@(x',_): xys')
               | y == x'    = filterTail
--             | y <  x'    = xys               -- optimisation, assumes ascending
               | otherwise  = xy' : filterTail
           where filterTail = filterDups xy xys'


nonDupPairs k xs = go $ rawPairs k xs
    where go []         = []
          go (xy : xys) = xy : go (filterDups xy xys)


{-  -- tests
nonDupPairs 3 inList
===> [(1,4),(3,6),(7,10),(12,15)]

nonDupPairs 3 [1,4,7]
===> [(1,4)]

nonDupPairs 3 [4,1,4,7]
===> [(4,7),(1,4)]

nonDupPairs 3 [4,1]
===> []
nonDupPairs 3 [7,4,1]
===> [(1,4)]

nonDupPairs 3 [1,5,8,10,14]
===> [(5,8)]
nonDupPairs 3 [1,5,6,7,8,11,15]
===> [(5,8)]
nonDupPairs 3 [1,5,9,10,14]
===> []

-}

The test with [4,1] produces nothing, because of the y <- ys optimisation in the List comprehension. Changing that to y <- xs, as @jaror’s List comprehension, does then yield [(1,4)].

“You sort the list, and then …”

Can’t speak to efficiency, but this is quite readable.

f _ [] = []
f n (x:xs)
  | v `elem` xs = (x, v) : f n (filter (/= v) xs)
  | otherwise = f n xs
  where v = n + x

I expect there’s a really clean recursion scheme formulation, but I’d have to go stare at the docs for an hour to produce it. :smiley: There’s probably some easy performance wins to wring out of some of the existing solutions with a little discrimination or witherable.

2 Likes

Yeah: both your elem and filter are scanning the tail looking for the same condition – and filter scans the whole tail even after finding v. Similarly, the approaches generating the Cartesian product (via List comprehension) “and then” filtering, are scanning by multiple passes and appending. But we know that for element x there is at most one k+x; and it’s at most k positions beyond x in the list; and if we’ve found it, there’s no need to scan beyond that point in the List. (Assuming the List is indeed a set in ascending sequence.)

import Data.Set as Set

viaSetPairs k xs = go $ Set.fromList xs        -- could be 'fromDistinctAscList', if we're sure it is
       where  go s
                | size s <= 1                 =             [] 
                | otherwise = case Set.deleteFindMin s of
                    (x, s') | member x_k s'   -> (x, x_k) : go (delete x_k s')
                            | otherwise       ->            go s'
                        where x_k = x+k

Set.member goes straight to x+k by binary chop (if it’s there); similarly Set.delete. (We could do better: deleteMemberMaybe :: Ord a => a -> Set a -> Maybe (Set a) (if it existed) could return in one pass Just s'' with the member deleted, or Nothing if it’s not there.)

I guess the o.p. won’t like the smell of imperativism, especially the nasty stench of destructive update.

Oh, that’s quite good. Do you think it could be done in one pass with alterF? I was trying to work out something nicer that leaned on Maybe to do the hard work and avoid the double scan, but I exhausted my before-bed nerd snipe time. :smiley:

(What I don’t like about my via Data.Set approach is it needs building the set then deleting elements, to ensure there’s no duplicates in the result. Deleting is computationally expensive, in terms of rebalancing the tree.)

So here’s a more Listiferous approach, along the lines of the o.p. “secondary index”: scan down the input, maintaining a parallel/secondary list with wanted pairs. Assuming the input is strictly ascending, the wanteds will also be strictly ascending, so we only go tailward along them/there’s no destructive update.

pllPairs :: (Num a, Ord a) => a -> [a] -> [(a,a)]
pllPairs k xs = go xs []
    where 
          go []         _                   = []
          go (x:xs)     []                  = go xs [(x, x+k)]  -- want x+k in xs
          go xxs@(x:xs) ( yyk@(_, yk) : ys) = 
            case compare x yk of
              EQ -> yyk : go xs ys                              -- matched a wanted
              LT -> go xs  (yyk: ordInsert (x, x+k) ys)         -- yyk not yet matched
              GT -> go xxs ys                                   -- yyk not matched, drop it
          
          ordInsert :: Ord a => (a,a) -> [(a,a)] -> [(a,a)]
          ordInsert yyk         []                    = [yyk]
          ordInsert yyk@(_, yk) zzs@(zzk@(_, zk): zs) = 
            case compare yk zk of
              EQ -> error "duplicates in List"
              LT -> yyk: zzs
              GT -> zzk: ordInsert yyk zs

This has some (crude) protection against deviant lists: it copes with Lists that don’t wholly pair up; but is fairly erratic faced with non-ascending or non-unique elements.

Interesting problem!

My solve… took a few minutes of head scratching so I tried to capture my thought process along the way and how that translates into a functional algorithm. Tacked on some ideas for generalizing the operation.

Rough intuition:

  • In any collection of integers, there is a minimum value, ‘n’
  • Given the rule of matching (n, n+k), there can only be one pairing available for the minimum value
  • Thus, this pair must be an output pair - generate it and hold for confirmation
  • Repeatedly pull the next-minimum value ‘m’, which must be at least as large as ‘n’
  • If m matches n+k, the pair is confirmed
  • If m exceeds n+k, then no future m could match, so the pair is invalid and the list cannot be paired completely, do whatever is appropriate in this case
  • Otherwise, m must be a new pair (m,m+k), which is added to the end of the list of confirmation pairs

Declaratively:

  • Translating to a declarative algorithm using a Zipper:
    • Zipper is a single-pass, expected O(1)-work per element
    • Zipper holds the source list, the output list, and a filter list (and error list if desired)
    • Extract minimum element from the source list
    • If the minimum element matches the partner of the lowest item in the filter list
      • Pair is confirmed, move (n,n+k) to output list
    • If the minimum element is larger than the partner of the lowest item in the filter list
      • The list cannot be paired, handle error (e.g. move to error list)
    • Otherwise add (n,n+k) to the end of the filter list
      • This is the only non-O(1) step, as the filter list may be arbitrarily long for large values of k

Generalizations:

  • (+) - Plus is just one possibility, can this algorithm be generalized over the operator?
  • (Int) - Many useful properties of integers don’t apply to all types, might be interesting to explore using the polymorphic type [a] as input
    • Zipper structure remains the same, but the algorithm requires (Ord a)
    • Is there a single-pass algorithm that can use (Eq a) or is (Ord a) requirement?
  • Ambiguous - Some lists are ambiguous, such as [1,4,7] since there are two partial answers ([1,4],7) and (1,[4,7])
    • What different kinds of results might a client want in these cases?
  • Streaming - zippers work well for processing streams
    • What are the requirements on a stream for this algorithm to work?
    • Is there an algorithm that works for streams that does not have those requirements?
import Data.List

-- Zipper [Errors] [Output] [Filtering] [Source]
data Zipper = Zipper [Int] [(Int,Int)] [(Int,Int)] [Int] deriving Show

step :: Int -> Zipper -> Zipper
step k z@(Zipper _ _ _ []) = z
step k (Zipper ers ys [] (x:xs)) = Zipper ers ys [(x,x+k)] xs
step k (Zipper ers ys fs@(pr@(m,n) : prs) (x:xs))
    |   x == n  = Zipper   ers  (pr:ys)     prs         xs
    |   x >  n  = Zipper (m:ers)   ys  (prs++[(x,x+k)]) xs -- Error case; add unpaired value to error list
    | otherwise = Zipper   ers     ys   (fs++[(x,x+k)]) xs

walkList k z@(Zipper e o f []) = Zipper (map (fst) f ++ e) o [] []  -- Add any remaining unpaired values to the error list
walkList k z = walkList k $ step k z

solve k ls = let (Zipper ers out fls src) = walkList k $ Zipper [] [] [] (sort ls) in (ers, out)

Tests:

solve 3 [1, 3, 4, 6, 7, 10, 12, 15]

==> ([],[(12,15),(7,10),(3,6),(1,4)])

solve 3 [1,4,7]

==> ([7],[(1,4)])

solve 3 [4,1,4,7]

==> ([],[(4,7),(1,4)])

solve 3 [1,5,6,7,8,11,15]

==> ([11,15,7,6,1],[(5,8)])

Thanks Bryan, your approach is a fancier version of my most recent solution, which is in turn a fancier version of @jaror’s ‘second stab’.

We don’t usually reckon ‘work-per-element’ because it’s always O(1). Perhaps you mean O(n)? That is, the work is proportional the number of elements? But it’s worse than O(n), because your x > n and otherwise (x < n) cases use (++) to maintain the [Filtering] – as do those earlier two versions. So the work of append depends on the length of the tail, making it O(n + 1/n). Also your solve starts with (sort ls) – which isn’t strictly necessary by the initial assumptions, but is O(log n) assuming a random initial sequence. If you’re going to sort, you might as well load to a Data.Set; then you can in effect filter by binary-chop lookup, see my earlier solution.

Sorry, I should have been more specific - the expected-O(1) is for the zipper step function. As you note, it concatenates elements to the filter list, so “expected” here covers the non-O(1) behavior under an assumption that k << n (and other non-degenerate case assumptions).

Of course using concatenation is idiomatic and could be replaced with an amortized-O(1) queue to eliminate that bottleneck (eg. Efficient Amortised and Real-Time Queues in Haskell - Well-Typed: The Haskell Consultants).

Data.Set was my initial thought as well until I saw one of your test cases containing a duplicate value: [4,1,4,7]. Since the OP did not link the original question’s source to verify the requirements, I went ahead with the assumption that duplicate values would be allowed, so Set would not work, but sort does.

I frequently analyze the suitability of algorithms for distributed use-cases. Here we look for the scope of resources used in processing each element individually - is the processing local or does it require access to some global data or partially-computed results, etc. This will inform the batch size sent to each worker, the shape of the fan-out and fan-in graphs, and whether the workers must be horizontally or vertically scaled.

In this case the zipper state machine shows us that (after sorting or otherwise extracting the lowest element), the per-element processing is purely local with expected constant time and space. So this algorithm is suitable for domains such as stream-processing and analytics. I can imagine a looser variant of this algorithm being used for pairing together players in an online game, for example.

While this algorithm is not suitable for horizontal scaling using divide-and-conquer (since each division can create an unpaired element in an otherwise completely paired list), it is interesting to note that it can be run from both ends of a single list without introducing additional unpaired elements.

Great approaches, I greatly appreciate it!
For what it’s worth, it was a subproblem of this problem:

Find Array Given Subset Sums

You are given an integer n representing the length of an unknown array that you are trying to recover. You are also given an array sums containing the values of all 2n subset sums of the unknown array (in no particular order).

Return the array ans of length n representing the unknown array. If multiple answers exist, return any of them.

An array sub is a subset of an array arr if sub can be obtained from arr by deleting some (possibly zero or all) elements of arr. The sum of the elements in sub is one possible subset sum of arr. The sum of an empty array is considered to be 0.

The test cases are generated to have one correct answer, but I think it wouldn’t be hard to error out if the constraint is not met. On the other hand, any answer that fits the bill is fine.

Example test cases:

Input: n = 3, sums = [-3,-2,-1,0,0,1,2,3]
Output: [1,2,-3]
Input: n = 2, sums = [0,0,0,0]
Output: [0,0]
Input: n = 4, sums = [0,0,5,5,4,-1,4,9,9,-1,4,3,4,8,3,8]
Output: [0,-1,4,5]

Could you produce a nice code for the entire problem?
I’ve been personally struggling to achieve that.
And no, it is not a homework problem, I was curious how one would solve this in haskell.

1 Like

Try something like this

import Data.List as List ((\\), intersect, nub, sort)
import Data.Foldable (asum)

bruteforce 0 [] _ cur = Just cur
bruteforce 0 arr _ cur = Nothing
bruteforce n arr sums cur =
  let 
    sub x = 
      let
        nextsums = fmap ((+) x) sums
      in
      if intersect nextsums arr == nextsums
        then bruteforce (n - 1) (arr \\ nextsums) (sums ++ nextsums) (x:cur)
        else Nothing
  in
  asum $ map sub (nub arr)

unsum :: Int -> [Int] -> Maybe [Int]
unsum n arr =
  bruteforce n (sort arr \\ [0]) [0] []
    
main = 
  case unsum 4 [0,0,5,5,4,-1,4,9,9,-1,4,3,4,8,3,8] of
    Nothing -> putStrLn "Failed"
    Just arr -> putStrLn $ show arr