Hi all,
As others, I have been obsessed with the Count Words article Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust. In this article, the author tries to count the occurrence of words in a text file in different languages. It turns out that Haskell implementation is 6 times slower than the Python implementation!!
The original implementation (https://github.com/benhoyt/countwords/blob/master/haskell/src/CountWords.hs), by Adrien Glauser, is great, but there turned out to be some inefficiencies in the Text.toLower function (Why is Data.Text.toLower so slow in Haskell?), and it uses the containers Map instead of a hashmap.
The author of the count-words article no longer accepts new versions of programs, but I think it could be fun to try to optimize it.
First, my machine benchmarks the simple python version to:
$ hyperfine --warmup 2 'python3 simple.py < kjvbible_x10.txt'
Benchmark 1: python3 simple.py < kjvbible_x10.txt
Time (mean Ā± Ļ): 3.349 s Ā± 0.053 s [User: 3.295 s, System: 0.040 s]
Range (min ā¦ max): 3.297 s ā¦ 3.480 s 10 runs
And the optimized C version is:
Time (mean Ā± Ļ): 215.4 ms Ā± 2.4 ms [User: 202.7 ms, System: 9.7 ms]
Range (min ā¦ max): 211.3 ms ā¦ 218.9 ms 13 runs
A simple optimization would be to use bytestrings, as the article explicitly assumes the input is in ascii, and use the hashmap from unordered-containers:
module CountWords where
-- base
import Data.List (sortOn)
import Data.Ord (Down (..))
import Data.Foldable (forM_)
-- bytestring
import Data.ByteString as B
import Data.ByteString.Char8 as C
-- unordered-containers
import Data.HashMap.Strict as M
countWords :: B.ByteString -> [(B.ByteString, Int)]
countWords =
sortOn (Down . snd)
. count
. C.words
. B.map toLower
where
count =
M.toList . M.fromListWith (+) . map (\x -> (x, 1))
toLower a
| a >= 65 && a <= 90 = a + 32
| otherwise = a
runCountWords :: IO ()
runCountWords = do
contents <- B.getContents
forM_ (countWords contents) $ \(w, i) -> do
B.putStr w >> putStr " " >> print i
This instantly gets us in python territory, with a still very readable solution:
Time (mean Ā± Ļ): 2.214 s Ā± 0.018 s [User: 2.120 s, System: 0.195 s]
Range (min ā¦ max): 2.182 s ā¦ 2.255 s 10 runs
But can we go faster? Building on Mutable Data structures? Why so hard to find - #54 by jaror, I have created the fastest version I can come up with:
module CountWords where
-- base
import Control.Monad (forM_, unless)
import Data.Char (isSpace)
import Data.Function (fix)
import Data.List (sortOn)
import Data.Ord (Down (..))
import System.IO (stdin)
-- bytestring
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
-- vector-hashtable
import qualified Data.Vector.Hashtables as H
-- vector
import qualified Data.Vector.Mutable as VM
import qualified Data.Vector.Unboxed.Mutable as UM
type HashTable k v =
H.Dictionary (H.PrimState IO) VM.MVector k UM.MVector v
runCountWords :: IO ()
runCountWords = do
t <- H.initialize 10000 :: IO (HashTable B.ByteString Int)
let incr = H.alter t (Just . maybe 1 (+ 1))
flip fix B.empty $ \rec b -> do
bs <- B.hGet stdin (64 * 1024)
if B.null bs
then do
mapM_ incr (C.words b)
else do
let (initial, last) =
B.spanEnd (not . isSpace)
. mappend b
. B.map toLower
$ bs
mapM_ incr (C.words initial)
rec last
xs <- H.toList t
forM_ (sortOn (Down . snd) xs) $ \(w, i) -> do
B.putStr w >> putStr " " >> print i
where
toLower a
| a >= 65 && a <= 90 = a + 32
| otherwise = a
isSpace = (== 32)
Which is fast, but still 6 times slower than C.
Time (mean Ā± Ļ): 1.350 s Ā± 0.010 s [User: 1.278 s, System: 0.194 s]
Range (min ā¦ max): 1.337 s ā¦ 1.367 s 10 runs
So here is my question, can we go faster? I hope this challenge can let some of the experienced haskellers out there show off their amazing skills :).