Hi,
I’m currently reading the book “Get programming with Haskell”. As an exercise, we want to generate a list of prime numbers.
module Primes where
import Data.List (unfoldr)
import GHC.List (foldr')
divBy :: Int -> Int -> Bool
divBy a b = 0 /= mod b a
sieve1 :: [Int] -> [Int]
sieve1 [] = []
sieve1 (x : xs) =
x : sieve1 (filter (divBy x) xs)
sieve2 :: [Int] -> [Int]
sieve2 = foldr (\x xs -> x : filter (divBy x) xs) []
sieve3 :: [Int] -> [Int]
sieve3 = foldr' (\x xs -> x : filter (divBy x) xs) []
sieve4 :: [Int] -> [Int]
sieve4 = unfoldr step
where
step [] = Nothing
step (x : xs) = Just (x, filter (divBy x) xs)
{-
>>> take 5 (sieve1 [2..])
[2,3,5,7,11]
>>> take 5 (sieve4 [2..])
[2,3,5,7,11]
-}
{-
>>> take 10 primes
[2,3,5,7,11,13,17,19,23,29]
-}
primes :: [Int]
primes = sieve1 [2 ..]
sieve1
was my first (naive) implementation. So I was then surprised to find out that sieve1
performs much better than either sieve2
or sieve3
. I finally found a way to replicate the performance of sieve1
(wanting to use a higher-order function) with sieve4
.
Here’s a summary of the performance (ran with stack test
):
benchmarking sieve1/sum [2..10000]
time 6.596 ms
benchmarking sieve1/take 3 from [2..10000000]
time 11.33 ns
---
benchmarking sieve2/sum [2..10000]
time 408.8 ms
benchmarking sieve2/take 3 from [2..10000000]
time 10.29 ns
---
benchmarking sieve3/sum [2..10000]
time 447.6 ms
benchmarking sieve3/take 3 from [2..10000000]
time 1.256 s
---
benchmarking sieve4/sum [2..10000]
time 6.489 ms
benchmarking sieve4/take 3 from [2..10000000]
time 14.99 ns
Here’s my test:
{-# LANGUAGE NumericUnderscores #-}
import Criterion.Main
import Primes
benchSieve :: String -> ([Int] -> [Int]) -> Benchmark
benchSieve name sieveFunc =
bgroup
name
[ bench "sum [2..10000]" $ whnf (sum . sieveFunc) [2 .. 10_000]
, bench "take 3 from [2..10000000]" $ whnf (take 3 . sieveFunc) [2 .. 10_000_000]
]
main :: IO ()
main =
defaultMain
[ benchSieve "sieve1" sieve1
, benchSieve "sieve2" sieve2
, benchSieve "sieve3" sieve3
, benchSieve "sieve4" sieve4
]
I have a couple of questions.
Firstly, I encountered worse performance via GHCi. GHCi seems to run a bytecode interpreter. Is it possible to run compiled optimized code instead?
-- stack repl
ghci> :set +s
ghci> sum $ sieve2 [2..10000]
5736396
(3.93 secs, 5,636,113,056 bytes)
Secondly, I’d like to understand more about what’s happening under the hood exactly. Going from sieve1
to sieve4
wasn’t that obvious to me. I thought I understood that foldr
was generally to be favored by default in Haskell but there’s obviously more subtle things going on here.
So in summary, I’d like to understand what the evaluated code looks like. And I’d be interested to hear about any tips or “general rules” one could follow when using folds (or manual recursion!)
Thanks