That code is not that slow for me. It takes about 200 ms on my computer (even in GHCi it takes only about 500 ms). You can make it slightly faster by compiling with optimisations ghc -O ... and using Int instead of the default Integer:
k = [2300000, 2299999.. 1 :: Int]
(You only need a change there, the rest will be inferred automatically)
But those two optimisations only speed it up from 220 ms to 170 ms.
That’s fine, but the variable names are not very descriptive. If you want to avoid names I’d just write it all in one expression:
main = print $ sum $ take 535 $ sort $ map (+ 100) [2300000, 2299999 .. 1]
I thought using unboxed vectors might be faster, so I tried this code:
import qualified Data.Vector as Vector
import qualified Data.Vector.Algorithms.Intro as Vector
main :: IO ()
main =
print a
where
k = Vector.enumFromStepN (2300000 :: Int) (-1) 2300000
b = Vector.map (+100) k
c = Vector.take 535 $ Vector.modify Vector.sort b
a = Vector.sum c
But that takes 2.5 seconds. So a big win for laziness I’d say.
(But note that these are still immutable vectors, so this code probably creates a few copies of the whole array. You could optimise that even further, but that takes some skill and effort.)
Edit: changing from Intro sort to Tim sort makes it a lot faster, only taking 410 ms.
I have read that the laziness of the language allows the compiler to rearrange the order of operations to get fast code.
However, I cannot get performance anywhere near that of julia. This code is 5 orders of magnitude faster.
# I am only using lazy functions and data structures,
# so you can compare it with the haskell version
function f()
k = 2300000:-1:1
b = sort(k.+100) # '.+' is vectorized '+'
c = Iterators.take(b, 535) # lazy version of take
sum(c)
end
@time f() # 196880 in 0.000002 seconds
The range 2300000:-1:1 is converted to 2300100:-1:101.
In haskell is it possible to take [2300000, 2299999.. 1] and turn it into [2300100, 2300099..101]?
Ah, I just realized I forgot to use unboxed vectors which are quite a bit faster for Int. Just changing the import to:
import qualified Data.Vector.Unboxed as Vector
makes Tim sort only take 170 ms. And I notice Intro sort has a special partial sorting version which is even faster:
c = Vector.take 535 $ Vector.modify (\v -> Vector.partialSort v 535) b
That takes only 50 ms.
You can improve it a bit more by using unsafeThaw and unsafeFreeze instead of modify:
c = Vector.take 535 $ runST $ do
v <- Vector.unsafeThaw b
Vector.sort v
Vector.unsafeFreeze v
Then I get to about 20 ms. Compiling with -O2 takes it down to about 10 ms. Interestingly, at this point Tim sort is faster again at 10 ms vs 20 ms for a partial Intro sort.
That’s what fusion does and I believe it should be done in both the list version and the vector version as long as you compile with optimisations.
Indeed it isn’t; which is sad because List is the first recursive data structure you’re taught on Intro courses. OTOH the results @jaror is getting are pretty nippy.
Most intros don’t at first explain the default Integer is probably not what you want for performant code; also don’t tell running direct at the GHCi prompt is interpreting, not compiling.
So how were you running this when you first posted/you said “extremely slow”?
Julia really does seem fast. It only takes a relatively long time (6 ms) for the first run where it is spending more than 99% of its time doing compilation. After that it is fast, even if I add some extra data dependencies in the code to rule out that it is caching the arrays. But the allocation statistics show that it is not allocating the full array any way.
I strongly suspect Julia is recognizing that the enumeration iterator is already sorted in descending order, so it simply has to reverse the enumeration.
With a random array it takes a more sensible 40-60 ms and 70MiB allocations. Part of that is of course the overhead of generating the random array, but that part should be faster than the sorting.
Another way to defeat it is to call map explicitly:
function f()
k = 2300000:-1:1
b = sort(map(x -> x + 100,k))
c = Iterators.take(b, 535)
print(sum(c))
end
That takes around 7 ms and 35 MiB, so still very fast.
How do you benchmark haskell code? I am getting inconsistent results.
If I compile and measure the whole program with “time ./Main” I get a result, if I use Criterion instead (as indicated in the official tutorial) I get a much better result.
Can someone show me how to benchmark this example?
Edit:
Example, if I have this file Main.hs:
module Main where
import Criterion.Main
import Data.List (sort)
funToTest2 :: Int -> Int
funToTest2 toTake =
sum $ take toTake (sort $ map (+100) [2300000, 2299999 .. 1])
main :: IO ()
main = defaultMain [
bench "funToTest" $ whnf funToTest2 535
]
Then I compile it with: ghc -O2 Main.hs
And run with: ./Main --output bench.html
I get:
time 2.349 ÎĽs (2.286 ÎĽs .. 2.445 ÎĽs)
0.993 R² (0.982 R² .. 0.999 R²)
mean 2.303 ÎĽs (2.277 ÎĽs .. 2.391 ÎĽs)
std dev 143.0 ns (61.01 ns .. 279.8 ns)
variance introduced by outliers: 74% (severely inflated)
Here’s how I would benchmark your original code:
import Test.Tasty.Bench
import Data.List (sort)
foo :: Int -> Int
foo n = a where
k = [n, n - 1 .. 1]
b = map (+100) k
c = take 535 $ sort b
a = sum c
main :: IO ()
main = defaultMain
[ bench "foo" $ whnf foo 2300000 ]
(This uses @Bodigrim’s tasty-bench as a modern and lightweight drop-in replacement for criterion. I just noticed the latest version has cool live updating measurements.)
On my machine, the result is:
All
foo: OK
222 ms ± 11 ms
The issue here is that the compiler will optimize it to something like this:
myBigList = sort $ map (+100) [2300000, 2299999 .. 1]
funToTest2 toTake =
sum $ take toTake myBigList
Which means the big list will only be computed (and sorted) in the very first benchmark run.
To solve this you can make that 2300000 a parameter of the function.
Ok, I tried to write a version that does operations directly on ranges, so I created my own version of ranges.
My problem is that it even seems to perform better than julia, and I don’t understand if it is a benchmark-related problem.
Here is the code:
{-# LANGUAGE InstanceSigs #-}
module Main where
import Test.Tasty.Bench
import Data.List (sort)
import Data.Maybe
data RangeFromThenTo a = RangeFromThenTo !a !a !a
-- show $ RangeFromThenTo 1 2 10 --> {1, 2 .. 10}
instance Show a => Show (RangeFromThenTo a) where
show :: Show a => RangeFromThenTo a -> String
show (RangeFromThenTo start firstStep end) =
"{" ++ show start ++ ", " ++ show firstStep ++ " .. " ++ show end ++ "}"
-- allow only ranges where
-- start < firstStep < end == True or
-- start > firstStep > end == True
rangeFromThenTo :: (Num a, Ord a) => a -> a -> a -> Maybe (RangeFromThenTo a)
rangeFromThenTo start firstStep end =
if (start < firstStep && firstStep < end) || (start > firstStep && firstStep > end)
then Just $ RangeFromThenTo start firstStep end
else Nothing
-- get the step of a range {1, 2 .. 10} --> 2 - 1 --> 1
rangeStep :: Num a => RangeFromThenTo a -> a
rangeStep (RangeFromThenTo start firstStep end) = firstStep - start
rangeToList :: Enum a => RangeFromThenTo a -> [a]
rangeToList (RangeFromThenTo start firstStep end) = [start, firstStep .. end]
isRangeSorted :: (Ord a) => RangeFromThenTo a -> Bool
isRangeSorted (RangeFromThenTo start firstStep end) =
start < firstStep && firstStep < end
reverseRange :: Num a => RangeFromThenTo a -> RangeFromThenTo a
reverseRange r@(RangeFromThenTo start firstStep end) =
RangeFromThenTo end (end - rangeStep r) start
sortRange :: (Ord a, Num a) => RangeFromThenTo a -> RangeFromThenTo a
sortRange r
| isRangeSorted r = r
| otherwise = reverseRange r
-- {1, 2 .. 10} + 1 --> {2, 3 .. 11}
addRangeNum :: Num a => RangeFromThenTo a -> a -> RangeFromThenTo a
addRangeNum (RangeFromThenTo start firstStep end) n =
RangeFromThenTo (start + n) (firstStep + n) (end + n)
funToTest :: Int -> Int -> Int -> Int -> Int -> Int
funToTest toTake st end toAdd begin =
a
where
k = fromJust $ rangeFromThenTo begin st end -- generate and unwrap a range
b = sortRange $ addRangeNum k toAdd -- add and sort the range
c = take toTake $ rangeToList b -- convert the range to list and apply take
a = sum c
main :: IO ()
main = defaultMain [
bench "range version" $ whnf (funToTest 535 2299999 1 100) 2300000
]
julia code on my pc: min 401ns, median 427ns, mean 498.489 ns ± 644.923 ns
my version:
All
range version: OK
336 ns ± 28 ns
Both criterion and and tasty agree on the result, but I’m not sure if there’s a problem in the benchmark and maybe it’s only calculating the function once.
One thing that I notice is that you’ve only made that 230000 an “parameter” of the benchmark. It would perhaps be slightly better to make all arguments of funToTest parameters of the benchmark like this:
whnf (\(a, b, c, d, e) -> funToTest a b c d e) (535, 2299999, 1, 100, 2300000)
But I don’t think that will make a big difference in this case.
I’m not that surprised GHC produces faster code in this case, because GHC spends quite a lot of time optimising whereas the Julia compiles a lot faster as far as I know.