Apply-merge: Lift a binary, increasing function onto ordered lists and produce ordered output

I wrote this library to introduce the function

applyMerge :: Ord c => (a -> b -> c) -> [a] -> [b] -> [c]

I haven’t seen the idea behind this function before. Some documentation from the README:

If f is a binary function that is non-decreasing in both arguments, and xs and ys are (potentially infinite) ordered lists, then applyMerge f xs ys is an ordered list of all f x y, for each x in xs and y in ys.

Producing n elements of applyMerge f xs ys takes O(n log n) time and O(√n) auxiliary space, assuming that f and compare take O(1) time.

Examples

With applyMerge, we can implement a variety of complex algorithms succinctly. For example, the Sieve of Erastosthenes to generate prime numbers:

primes :: [Int]
primes = 2 : ([3..] `minus` composites)    -- `minus` from data-ordlist

composites :: [Int]
composites = applyMerge (*) primes [2..]

3-smooth numbers (Wikipedia):

smooth3 :: [Integer]
smooth3 = applyMerge (*) (iterate (*2) 1) (iterate (*3) 1)

Gaussian integers, ordered by norm (Wikipedia):

zs :: [Integer]
zs = 0 : concatMap (\i -> [i, -i]) [1..]

gaussianIntegers :: [GaussianInteger]      -- `GaussianInteger` from arithmoi
gaussianIntegers = map snd (applyMerge (\x y -> (norm (x :+ y), x :+ y)) zs zs)

Square-free integers (Wikipedia):

squarefrees :: [Int]
squarefrees = [1..] `minus` applyMerge (*) (map (^2) primes) [1..]

Naming

The name applyMerge comes from the idea of applying f to each x and y, and merging the results into one sorted output. I’m still thinking of the ideal name for this function. Other options include sortedLiftA2/orderedLiftA2, from the idea that this function is equivalent to sort (liftA2 f xs ys) when xs and ys are finite. If you have any ideas on the naming, let me know!

See ALGORITHM.md for a full exposition of the applyMerge function and its implementation.

I’m open to questions/comments/feedback of any kind, especially about whether you’ve seen this idea before, and on the name of the function. :slight_smile:

9 Likes

That’s very elegant! I wonder if arithmoi can benefit from it.

1 Like

That’s very elegant! I wonder if arithmoi can benefit from it.

Thanks! That would be great if it was useful in arithmoi, it is well-suited for a lot of number-theoretic sequences.

…whether you’ve seen this idea before…

I’m not aware of a name for this algorithm, but after some searching I found it described on the X + Y sorting Wikipedia page. It also resembles Dijkstra’s algorithm to an extent.

…and O(√n) auxiliary space…

While this is true, it does not include the space occupied by the lists. If the lists are generated lazily, more space may be required overall because the implementation has to hold on to the lists. As a simple example, take n (applyMerge const [1 :: Int ..] [1 :: Int ..]) requires O(n) space. Just something to be aware of.

1 Like

I’m not aware of a name for this algorithm, but after some searching I found it described on the X + Y sorting Wikipedia page.

Thanks for the pointer! I found the original excerpt that link is referring to

“Got it!,” I said. “We will keep track of index pairs in a priority queue, with the sum of the fare costs as the key for the pair. Initially we put only pair (1, 1) on the queue. If it proves it is not feasible, we put its two successors on—namely (1, 2) and (2, 1). In general, we enqueue pairs (i + 1, j) and (i, j + 1) after evaluating/rejecting pair (i, j). We will get through all the pairs in the right order if we do so.”

The gang caught on quickly. “Sure. But what about duplicates? We will construct pair (x, y) two different ways, both when expanding (x−1, y) and (x, y −1).”

“You are right. We need an extra data structure to guard against duplicates. The simplest might be a hash table to tell us whether a given pair exists in the priority queue before we insert a duplicate. In fact, we will never have more than n active pairs in our data structure, since there can only be one pair for each distinct value of the first coordinate.”

So seems pretty similar, except that applyMerge has an optimization to check that we don’t add (x, y) to the priority queue when there is already an (x′, y) with x < x′ or (x, y′) with y < y′ in the queue.

While this is true, it does not include the space occupied by the lists. If the lists are generated lazily, more space may be required overall because the implementation has to hold on to the lists. As a simple example, take n (applyMerge const [1 :: Int ..] [1 :: Int ..]) requires O(n) space. Just something to be aware of.

Yes this is true, thanks for pointing it out, I’ll add a note in the documentation about it. I might make a version of applyMerge that takes indexing functions instead of lists. Something like

applyMergeIndexed :: (Ord c) => (a -> b -> c) -> (Word -> a) -> (Word -> b) -> [c]

Which would be useful when computing the input is cheaper than storing the input. Then we could write your example as

take n (applyMergeIndexed const (+1) (+1))

which would take O(√n) space.

This is really fascinating! Just a comment: When reading your questions about naming & API, my knee-jerk reflex was “You should newtype ordered lists, and then make an applicative out of it via a Coyoneda embedding.” I tried it quickly, but I couldn’t get it to work.

Let me unwrap:

Define a newtype of ordered lists:

data OrderedList a where
  OrderedList :: Ord a => [a] -> OrderedList a

orderedList :: Ord a => [a] -> OrderedList a
orderedList = OrderedList . sort

-- | Only use if list is already sorted
unsafeOrderedList :: Ord a => [a] -> OrderedList a
unsafeOrderedList = OrderedList

Hide the constructor from the module API, then only ordered lists can be constructed under this type.

Now I was hoping that the Coyoneda construction Coyoneda OrderedList a would then be Applicative, with applyMerge as the heart piece of <*>. It is a Functor by construction (which OrderedList isn’t because a non-monotonous function, or more generally a function into a non-Ord type cannot be mapped onto it), so it would have made for a very convenient interface. See my failed attempt at GitHub - turion/apply-merge at dev_coyoneda.

2 Likes

This function is pivotal in McIlroy’s Enumerating the strings of regular languages. I gave it as an assignment a couple years ago (but simplified to just “generate all binary trees from small to large”).

(Appendum: Generally check out McIlroy for various Haskell goodies and Unix goodies.)

I think it is good to recognize that you get an applicative if you restrict to monotonic functions.

And also a good idea to make an ordered list type if you generally want to write instances intended for only ordered lists. (On an exam I newtyped it to help pose this question: What should instance Semigroup (OrderedList a) do such that you can use foldMap to sort a list?)

1 Like

Hey, sorry for the delay!

@turion I don’t know anything about Coyoneda but I’ll take a look at your fork and try to learn!

@treblacy Thanks for the reference! I’ll add it to the documentation. The function in the paper (xprod :: (Ord a, Ord b, Ord c) => (a->b->c) -> [a] -> [b] -> [c]) looks like it’s semantically the same, but a less efficient algorithm.

And also a good idea to make an ordered list type if you generally want to write instances intended for only ordered lists.

Yeah good idea. An OrderedList type is probably out of scope for this library for now, but I’ll keep it mind for the future (maybe for another standalone library).

1 Like