Even if I replace M.toList . countwords
by a magic function magic :: [T.Text] -> [(T.Text, Int)]
that just evaluates its argument deeply and then returns a precomputed answer then the Haskell solution is still slower than the unoptimized Python solution on my machine. Python (unoptimized): 1.78 s, Haskell: 2.85 s.
If I change magic
to return an empty list the whole program still takes 2.77 s, so the problem is definitely the mapping, splitting, and filtering.
If I subtract that overhead from the fastest I got using vector-hashtables
: 3.71, then we get 0.86~0.94 which is respectable. Maybe that is achievable without too much work?
Update: for some reason T.toLower
and T.words
do not improve the speed of the preprocessing much.