Linear Haskell QuickSort Performance

Why is linear haskell’s quicksort not quicker than the naive approach?
linear-base/examples/Simple/Quicksort.hs at master · tweag/linear-base (github.com)
I get the quicksort module above, and write the following test script.

import Quicksort
import Control.DeepSeq
import System.Random.SplitMix
import Control.Monad
import Control.Monad.State.Strict
import GHC.IO (evaluate)
import System.TimeIt (timeIt)
import Data.List (sort)

len = 5000000 :: Int

gen = mkSMGen 123546789

randomList :: [Int]
randomList = evalState (replicateM len (state nextInt)) gen

qs :: Ord a => [a] -> [a]
qs [] = []
qs (x:xs) = qs ltx ++ x : qs gex
  where
    ltx = [y | y <- xs, y < x]
    gex = [y | y <- xs, y >= x]

deepEval_ :: NFData a => a -> IO ()
deepEval_ = void . evaluate . force

main :: IO ()
main = do
    deepEval_ randomList
    timeIt $ deepEval_ (quickSort randomList)
    timeIt $ deepEval_ (qs randomList)
    timeIt $ deepEval_ (sort randomList)

-- CPU time:  29.16s
-- CPU time:  23.14s
-- CPU time:  34.23s

It turns out that the naive approach runs faster than the linear version. Why is that?
(By the way, why is sort in Data.List slowest?)

1 Like

Did you compile your program with -O2? Could you perhaps use something like tasty-bench?

Quicksort performance is notoriously sensitive to input. Perhaps you could try on a wider range of inputs, and perhaps some pathological cases (already sorted or reverse sorted).

  • As already mentioned: are you compiling with optimizations on?
  • For comparison, how much time does a deepEval_ randomList take?
  • The linear quicksort has a HasCallStack constraint in a frequently called function. Does removing it change anything?
  • The linear quicksort must first convert the input list into an Array. If we measure the conversion into Array and the sorting of the array separatedly, what do we get?

Also btw, GHC 9.10.1 now supports linear lets and wheres, so the code could be written in a more intuitive manner, like

go :: Int -> Int -> Array Int %1 -> Array Int
go lo hi arr
  | lo >= hi = arr
  | otherwise =
      let !(Ur pivot, arr1) = Array.read arr lo
          !(arr2, Ur ix) = partition arr1 pivot lo hi
          !arr3 = swap arr2 lo ix
          !arr4 = go lo (ix - 1) arr3
       in go (ix + 1) hi arr4
2 Likes

When sorting lists of 5 million elements, data locality becomes very important. This is where quicksort shines (even if applied to lists instead of arrays), but mergesort (as in Data.List) gets progressively worse. Say, for lists of 50 thousands elements Data.List.sort is faster than your quicksort. I’d say that sorting short and medium-sized lists is more important use case in practice, for millions of elements you likely want to employ an unboxed array anyway.

Also Data.List.sort is designed to benefit from partially sorted data (which necessarily increases constant factor, but does not improves anything if inputs are fully random), while deterministic quicksort degrades to quadratic behaviour.

Also GHC 9.12 is getting a faster Data.List.sort, roughly by 20%.

3 Likes

I did run with -O2 on. And today I tested it again. (It seems that the program runs faster because I just turned on my computer?)
CPU time: 0.53s generate data
CPU time: 15.95s linear quickSort
CPU time: 13.66s naive way
CPU time: 21.64s Data.List.sort

I didn’t run for several different lists because I think qs and QuickSort are using the very same algorithm (It’s sure that it is unfair to compare with Data.List.sort, which is always O(n logn)) . And I thought QuickSort should be 2x faster at least.

And thanks for the syntax of linear bindings.

With this strict pattern matching, the program is much faster.
CPU time: 0.55s generate
CPU time: 3.81s linear
CPU time: 11.97s naive
CPU time: 20.05s library

It’s interesting that the let version and the & \case version have very different performance profiles. Something must be preventing proper inlining in the & \case version. I’m not sure what, we’d have to look at the generated Core. Is is on the same version? Because I’ve done quite a bit of effort in GHC 9.10 to produce better Core. There will be some more in GHC 9.12, too (but I think that the most important is already in GHC 9.10). Because, other than that, these let !(…) should compile the same as a case (they didn’t use to, actually, but I had to add this special case to support linear lets).

3 Likes

Hello,
I’ve been investigating on this performance issue in the last 2 days.
First I looked at Core output for the linear array quicksort in 9.4, but couldn’t find anything obvious that wouldn’t be optimized. In particular, I didn’t notice any tuple left in the Core, they have probably been all optimized away. Of course, most Ur were still there, but that was to be expected.

I consolidated my experiments in a benchmark that I plan to integrate into linear-base, here: [WIP] Add LinArray vs naive Quicksort performance benchmark by tbagrel1 · Pull Request #477 · tweag/linear-base · GitHub. Results in 9.4 seems to be similar to yours, but we see a very noticiable improvement of the linear array quicksort when using 9.8. I don’t know precisely which GHC change is responsible for this though.

Let me know if you still need more info/investigation on this case.

8 Likes