Quicksort performance is notoriously sensitive to input. Perhaps you could try on a wider range of inputs, and perhaps some pathological cases (already sorted or reverse sorted).
As already mentioned: are you compiling with optimizations on?
For comparison, how much time does a deepEval_ randomList take?
The linear quicksort has a HasCallStack constraint in a frequently called function. Does removing it change anything?
The linear quicksort must first convert the input list into an Array. If we measure the conversion into Array and the sorting of the array separatedly, what do we get?
Also btw, GHC 9.10.1 now supports linear lets and wheres, so the code could be written in a more intuitive manner, like
go :: Int -> Int -> Array Int %1 -> Array Int
go lo hi arr
| lo >= hi = arr
| otherwise =
let !(Ur pivot, arr1) = Array.read arr lo
!(arr2, Ur ix) = partition arr1 pivot lo hi
!arr3 = swap arr2 lo ix
!arr4 = go lo (ix - 1) arr3
in go (ix + 1) hi arr4
When sorting lists of 5 million elements, data locality becomes very important. This is where quicksort shines (even if applied to lists instead of arrays), but mergesort (as in Data.List) gets progressively worse. Say, for lists of 50 thousands elements Data.List.sort is faster than your quicksort. I’d say that sorting short and medium-sized lists is more important use case in practice, for millions of elements you likely want to employ an unboxed array anyway.
Also Data.List.sort is designed to benefit from partially sorted data (which necessarily increases constant factor, but does not improves anything if inputs are fully random), while deterministic quicksort degrades to quadratic behaviour.
I did run with -O2 on. And today I tested it again. (It seems that the program runs faster because I just turned on my computer?)
CPU time: 0.53s generate data
CPU time: 15.95s linear quickSort
CPU time: 13.66s naive way
CPU time: 21.64s Data.List.sort
I didn’t run for several different lists because I think qs and QuickSort are using the very same algorithm (It’s sure that it is unfair to compare with Data.List.sort, which is always O(n logn)) . And I thought QuickSort should be 2x faster at least.
With this strict pattern matching, the program is much faster.
CPU time: 0.55s generate
CPU time: 3.81s linear
CPU time: 11.97s naive
CPU time: 20.05s library
It’s interesting that the let version and the & \case version have very different performance profiles. Something must be preventing proper inlining in the & \case version. I’m not sure what, we’d have to look at the generated Core. Is is on the same version? Because I’ve done quite a bit of effort in GHC 9.10 to produce better Core. There will be some more in GHC 9.12, too (but I think that the most important is already in GHC 9.10). Because, other than that, these let !(…) should compile the same as a case (they didn’t use to, actually, but I had to add this special case to support linear lets).
Hello,
I’ve been investigating on this performance issue in the last 2 days.
First I looked at Core output for the linear array quicksort in 9.4, but couldn’t find anything obvious that wouldn’t be optimized. In particular, I didn’t notice any tuple left in the Core, they have probably been all optimized away. Of course, most Ur were still there, but that was to be expected.