Linear Haskell QuickSort Performance

Wxkj99 · September 3, 2024, 8:58am

Why is linear haskell’s quicksort not quicker than the naive approach?
linear-base/examples/Simple/Quicksort.hs at master · tweag/linear-base (github.com)
I get the quicksort module above, and write the following test script.

import Quicksort
import Control.DeepSeq
import System.Random.SplitMix
import Control.Monad
import Control.Monad.State.Strict
import GHC.IO (evaluate)
import System.TimeIt (timeIt)
import Data.List (sort)

len = 5000000 :: Int

gen = mkSMGen 123546789

randomList :: [Int]
randomList = evalState (replicateM len (state nextInt)) gen

qs :: Ord a => [a] -> [a]
qs [] = []
qs (x:xs) = qs ltx ++ x : qs gex
  where
    ltx = [y | y <- xs, y < x]
    gex = [y | y <- xs, y >= x]

deepEval_ :: NFData a => a -> IO ()
deepEval_ = void . evaluate . force

main :: IO ()
main = do
    deepEval_ randomList
    timeIt $ deepEval_ (quickSort randomList)
    timeIt $ deepEval_ (qs randomList)
    timeIt $ deepEval_ (sort randomList)

-- CPU time:  29.16s
-- CPU time:  23.14s
-- CPU time:  34.23s

It turns out that the naive approach runs faster than the linear version. Why is that?
(By the way, why is sort in Data.List slowest?)

Kleidukos · September 3, 2024, 12:34pm

Did you compile your program with -O2? Could you perhaps use something like tasty-bench?

tomjaguarpaw · September 3, 2024, 12:59pm

Quicksort performance is notoriously sensitive to input. Perhaps you could try on a wider range of inputs, and perhaps some pathological cases (already sorted or reverse sorted).

danidiaz · September 3, 2024, 6:06pm

As already mentioned: are you compiling with optimizations on?
For comparison, how much time does a deepEval_ randomList take?
The linear quicksort has a HasCallStack constraint in a frequently called function. Does removing it change anything?
The linear quicksort must first convert the input list into an Array. If we measure the conversion into Array and the sorting of the array separatedly, what do we get?

Also btw, GHC 9.10.1 now supports linear lets and wheres, so the code could be written in a more intuitive manner, like

go :: Int -> Int -> Array Int %1 -> Array Int
go lo hi arr
  | lo >= hi = arr
  | otherwise =
      let !(Ur pivot, arr1) = Array.read arr lo
          !(arr2, Ur ix) = partition arr1 pivot lo hi
          !arr3 = swap arr2 lo ix
          !arr4 = go lo (ix - 1) arr3
       in go (ix + 1) hi arr4

Bodigrim · September 3, 2024, 9:39pm

When sorting lists of 5 million elements, data locality becomes very important. This is where quicksort shines (even if applied to lists instead of arrays), but mergesort (as in Data.List) gets progressively worse. Say, for lists of 50 thousands elements Data.List.sort is faster than your quicksort. I’d say that sorting short and medium-sized lists is more important use case in practice, for millions of elements you likely want to employ an unboxed array anyway.

Also Data.List.sort is designed to benefit from partially sorted data (which necessarily increases constant factor, but does not improves anything if inputs are fully random), while deterministic quicksort degrades to quadratic behaviour.

Also GHC 9.12 is getting a faster Data.List.sort, roughly by 20%.

Wxkj99 · September 4, 2024, 12:26am

I did run with -O2 on. And today I tested it again. (It seems that the program runs faster because I just turned on my computer?)
CPU time: 0.53s generate data
CPU time: 15.95s linear quickSort
CPU time: 13.66s naive way
CPU time: 21.64s Data.List.sort

I didn’t run for several different lists because I think qs and QuickSort are using the very same algorithm (It’s sure that it is unfair to compare with Data.List.sort, which is always O(n logn)) . And I thought QuickSort should be 2x faster at least.

And thanks for the syntax of linear bindings.

Wxkj99 · September 4, 2024, 12:49am

With this strict pattern matching, the program is much faster.
CPU time: 0.55s generate
CPU time: 3.81s linear
CPU time: 11.97s naive
CPU time: 20.05s library

aspiwack · September 10, 2024, 3:04am

It’s interesting that the let version and the & \case version have very different performance profiles. Something must be preventing proper inlining in the & \case version. I’m not sure what, we’d have to look at the generated Core. Is is on the same version? Because I’ve done quite a bit of effort in GHC 9.10 to produce better Core. There will be some more in GHC 9.12, too (but I think that the most important is already in GHC 9.10). Because, other than that, these let !(…) should compile the same as a case (they didn’t use to, actually, but I had to add this special case to support linear lets).

tbagrel1 · September 11, 2024, 7:59am

Hello,
I’ve been investigating on this performance issue in the last 2 days.
First I looked at Core output for the linear array quicksort in 9.4, but couldn’t find anything obvious that wouldn’t be optimized. In particular, I didn’t notice any tuple left in the Core, they have probably been all optimized away. Of course, most Ur were still there, but that was to be expected.

I consolidated my experiments in a benchmark that I plan to integrate into linear-base, here: [WIP] Add LinArray vs naive Quicksort performance benchmark by tbagrel1 · Pull Request #477 · tweag/linear-base · GitHub. Results in 9.4 seems to be similar to yours, but we see a very noticiable improvement of the linear array quicksort when using 9.8. I don’t know precisely which GHC change is responsible for this though.

Let me know if you still need more info/investigation on this case.

Topic		Replies	Views
Improving performance of mutable arrays using linear types	21	2773	September 18, 2024
Quick Sort variants performance Learn	2	1552	July 7, 2020
Optimizing an simple evaluator	3	464	August 14, 2022
New blog post: Is Haskell fast? Links	10	3068	December 2, 2020
Erro of sintaxe in the creation of an array Learn	7	442	March 18, 2022

Linear Haskell QuickSort Performance

Related topics