# Apply-merge: Lift a binary, increasing function onto ordered lists and produce ordered output

I wrote this library to introduce the function

``````applyMerge :: Ord c => (a -> b -> c) -> [a] -> [b] -> [c]
``````

I haven’t seen the idea behind this function before. Some documentation from the README:

If `f` is a binary function that is non-decreasing in both arguments, and `xs` and `ys` are (potentially infinite) ordered lists, then `applyMerge f xs ys` is an ordered list of all `f x y`, for each `x` in `xs` and `y` in `ys`.

Producing n elements of `applyMerge f xs ys` takes O(n log n) time and O(√n) auxiliary space, assuming that `f` and `compare` take O(1) time.

## Examples

With `applyMerge`, we can implement a variety of complex algorithms succinctly. For example, the Sieve of Erastosthenes to generate prime numbers:

``````primes :: [Int]
primes = 2 : ([3..] `minus` composites)    -- `minus` from data-ordlist

composites :: [Int]
composites = applyMerge (*) primes [2..]
``````

3-smooth numbers (Wikipedia):

``````smooth3 :: [Integer]
smooth3 = applyMerge (*) (iterate (*2) 1) (iterate (*3) 1)
``````

Gaussian integers, ordered by norm (Wikipedia):

``````zs :: [Integer]
zs = 0 : concatMap (\i -> [i, -i]) [1..]

gaussianIntegers :: [GaussianInteger]      -- `GaussianInteger` from arithmoi
gaussianIntegers = map snd (applyMerge (\x y -> (norm (x :+ y), x :+ y)) zs zs)
``````

Square-free integers (Wikipedia):

``````squarefrees :: [Int]
squarefrees = [1..] `minus` applyMerge (*) (map (^2) primes) [1..]
``````

## Naming

The name `applyMerge` comes from the idea of applying `f` to each `x` and `y`, and merging the results into one sorted output. I’m still thinking of the ideal name for this function. Other options include `sortedLiftA2`/`orderedLiftA2`, from the idea that this function is equivalent to `sort (liftA2 f xs ys)` when `xs` and `ys` are finite. If you have any ideas on the naming, let me know!

See ALGORITHM.md for a full exposition of the `applyMerge` function and its implementation.

I’m open to questions/comments/feedback of any kind, especially about whether you’ve seen this idea before, and on the name of the function.

9 Likes

That’s very elegant! I wonder if `arithmoi` can benefit from it.

1 Like

That’s very elegant! I wonder if `arithmoi` can benefit from it.

Thanks! That would be great if it was useful in `arithmoi`, it is well-suited for a lot of number-theoretic sequences.

…whether you’ve seen this idea before…

I’m not aware of a name for this algorithm, but after some searching I found it described on the X + Y sorting Wikipedia page. It also resembles Dijkstra’s algorithm to an extent.

…and O(√n) auxiliary space…

While this is true, it does not include the space occupied by the lists. If the lists are generated lazily, more space may be required overall because the implementation has to hold on to the lists. As a simple example, `take n (applyMerge const [1 :: Int ..] [1 :: Int ..])` requires O(n) space. Just something to be aware of.

1 Like

I’m not aware of a name for this algorithm, but after some searching I found it described on the X + Y sorting Wikipedia page.

Thanks for the pointer! I found the original excerpt that link is referring to

“Got it!,” I said. “We will keep track of index pairs in a priority queue, with the sum of the fare costs as the key for the pair. Initially we put only pair (1, 1) on the queue. If it proves it is not feasible, we put its two successors on—namely (1, 2) and (2, 1). In general, we enqueue pairs (i + 1, j) and (i, j + 1) after evaluating/rejecting pair (i, j). We will get through all the pairs in the right order if we do so.”

The gang caught on quickly. “Sure. But what about duplicates? We will construct pair (x, y) two different ways, both when expanding (x−1, y) and (x, y −1).”

“You are right. We need an extra data structure to guard against duplicates. The simplest might be a hash table to tell us whether a given pair exists in the priority queue before we insert a duplicate. In fact, we will never have more than n active pairs in our data structure, since there can only be one pair for each distinct value of the first coordinate.”

So seems pretty similar, except that `applyMerge` has an optimization to check that we don’t add (x, y) to the priority queue when there is already an (x′, y) with x < x′ or (x, y′) with y < y′ in the queue.

While this is true, it does not include the space occupied by the lists. If the lists are generated lazily, more space may be required overall because the implementation has to hold on to the lists. As a simple example, `take n (applyMerge const [1 :: Int ..] [1 :: Int ..])` requires O(n) space. Just something to be aware of.

Yes this is true, thanks for pointing it out, I’ll add a note in the documentation about it. I might make a version of `applyMerge` that takes indexing functions instead of lists. Something like

``````applyMergeIndexed :: (Ord c) => (a -> b -> c) -> (Word -> a) -> (Word -> b) -> [c]
``````

Which would be useful when computing the input is cheaper than storing the input. Then we could write your example as

``````take n (applyMergeIndexed const (+1) (+1))
``````

which would take O(√n) space.

This is really fascinating! Just a comment: When reading your questions about naming & API, my knee-jerk reflex was “You should newtype ordered lists, and then make an applicative out of it via a Coyoneda embedding.” I tried it quickly, but I couldn’t get it to work.

Let me unwrap:

Define a newtype of ordered lists:

``````data OrderedList a where
OrderedList :: Ord a => [a] -> OrderedList a

orderedList :: Ord a => [a] -> OrderedList a
orderedList = OrderedList . sort

-- | Only use if list is already sorted
unsafeOrderedList :: Ord a => [a] -> OrderedList a
unsafeOrderedList = OrderedList
``````

Hide the constructor from the module API, then only ordered lists can be constructed under this type.

Now I was hoping that the Coyoneda construction `Coyoneda OrderedList a` would then be `Applicative`, with `applyMerge` as the heart piece of `<*>`. It is a `Functor` by construction (which `OrderedList` isn’t because a non-monotonous function, or more generally a function into a non-`Ord` type cannot be mapped onto it), so it would have made for a very convenient interface. See my failed attempt at GitHub - turion/apply-merge at dev_coyoneda.

2 Likes

This function is pivotal in McIlroy’s Enumerating the strings of regular languages. I gave it as an assignment a couple years ago (but simplified to just “generate all binary trees from small to large”).

(Appendum: Generally check out McIlroy for various Haskell goodies and Unix goodies.)

I think it is good to recognize that you get an applicative if you restrict to monotonic functions.

And also a good idea to make an ordered list type if you generally want to write instances intended for only ordered lists. (On an exam I newtyped it to help pose this question: What should `instance Semigroup (OrderedList a)` do such that you can use `foldMap` to sort a list?)

1 Like

Hey, sorry for the delay!

@turion I don’t know anything about Coyoneda but I’ll take a look at your fork and try to learn!

@treblacy Thanks for the reference! I’ll add it to the documentation. The function in the paper (`xprod :: (Ord a, Ord b, Ord c) => (a->b->c) -> [a] -> [b] -> [c]`) looks like it’s semantically the same, but a less efficient algorithm.

And also a good idea to make an ordered list type if you generally want to write instances intended for only ordered lists.

Yeah good idea. An `OrderedList` type is probably out of scope for this library for now, but I’ll keep it mind for the future (maybe for another standalone library).

1 Like