As others, I have been obsessed with the Count Words article Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust. In this article, the author tries to count the occurrence of words in a text file in different languages. It turns out that Haskell implementation is 6 times slower than the Python implementation!!
The original implementation (https://github.com/benhoyt/countwords/blob/master/haskell/src/CountWords.hs), by Adrien Glauser, is great, but there turned out to be some inefficiencies in the Text.toLower function (Why is Data.Text.toLower so slow in Haskell?), and it uses the containers Map instead of a hashmap.
The author of the count-words article no longer accepts new versions of programs, but I think it could be fun to try to optimize it.
First, my machine benchmarks the simple python version to:
$ hyperfine --warmup 2 'python3 simple.py < kjvbible_x10.txt' Benchmark 1: python3 simple.py < kjvbible_x10.txt Time (mean ± σ): 3.349 s ± 0.053 s [User: 3.295 s, System: 0.040 s] Range (min … max): 3.297 s … 3.480 s 10 runs
And the optimized C version is:
Time (mean ± σ): 215.4 ms ± 2.4 ms [User: 202.7 ms, System: 9.7 ms] Range (min … max): 211.3 ms … 218.9 ms 13 runs
module CountWords where -- base import Data.List (sortOn) import Data.Ord (Down (..)) import Data.Foldable (forM_) -- bytestring import Data.ByteString as B import Data.ByteString.Char8 as C -- unordered-containers import Data.HashMap.Strict as M countWords :: B.ByteString -> [(B.ByteString, Int)] countWords = sortOn (Down . snd) . count . C.words . B.map toLower where count = M.toList . M.fromListWith (+) . map (\x -> (x, 1)) toLower a | a >= 65 && a <= 90 = a + 32 | otherwise = a runCountWords :: IO () runCountWords = do contents <- B.getContents forM_ (countWords contents) $ \(w, i) -> do B.putStr w >> putStr " " >> print i
This instantly gets us in python territory, with a still very readable solution:
Time (mean ± σ): 2.214 s ± 0.018 s [User: 2.120 s, System: 0.195 s] Range (min … max): 2.182 s … 2.255 s 10 runs
But can we go faster? Building on Mutable Data structures? Why so hard to find - #54 by jaror, I have created the fastest version I can come up with:
module CountWords where -- base import Control.Monad (forM_, unless) import Data.Char (isSpace) import Data.Function (fix) import Data.List (sortOn) import Data.Ord (Down (..)) import System.IO (stdin) -- bytestring import qualified Data.ByteString as B import qualified Data.ByteString.Char8 as C -- vector-hashtable import qualified Data.Vector.Hashtables as H -- vector import qualified Data.Vector.Mutable as VM import qualified Data.Vector.Unboxed.Mutable as UM type HashTable k v = H.Dictionary (H.PrimState IO) VM.MVector k UM.MVector v runCountWords :: IO () runCountWords = do t <- H.initialize 10000 :: IO (HashTable B.ByteString Int) let incr = H.alter t (Just . maybe 1 (+ 1)) flip fix B.empty $ \rec b -> do bs <- B.hGet stdin (64 * 1024) if B.null bs then do mapM_ incr (C.words b) else do let (initial, last) = B.spanEnd (not . isSpace) . mappend b . B.map toLower $ bs mapM_ incr (C.words initial) rec last xs <- H.toList t forM_ (sortOn (Down . snd) xs) $ \(w, i) -> do B.putStr w >> putStr " " >> print i where toLower a | a >= 65 && a <= 90 = a + 32 | otherwise = a isSpace = (== 32)
Which is fast, but still 6 times slower than C.
Time (mean ± σ): 1.350 s ± 0.010 s [User: 1.278 s, System: 0.194 s] Range (min … max): 1.337 s … 1.367 s 10 runs
So here is my question, can we go faster? I hope this challenge can let some of the experienced haskellers out there show off their amazing skills :).