[Solved] Why is Data.Text.toLower so slow in Haskell?

jaror · July 26, 2022, 8:28am

Continuing the discussion from Mutable Data structures? Why so hard to find:

Consider this simple program:

import qualified Data.Text as T
import qualified Data.Text.IO as T
import System.IO
import Control.Monad

main :: IO ()
main = isEOF >>= \b -> unless b $
  foldr seq main . T.words . T.toLower =<< T.getLine

And the equivalent Python program:

import sys

for line in sys.stdin:
    words = line.lower().split()

Running both programs on the input kjvbible.txt repeated 10 times yields the following results:

Haskell: 3.51 s (in hindsight I think this is an outlier due to file system caches, it should be more like 1.4 s)
Python: 0.59 s

Why is the difference so large?

jaror · July 26, 2022, 8:55am

After some more testing I found that the culprit is really T.toLower. ~~Presumably that is much slower in Haskell because it is not in-place, i.e. it doesn’t mutate the text.~~ The running times without that extra step are comparable:

Haskell

import qualified Data.Text as T
import qualified Data.Text.IO as T
import System.IO
import Control.Monad

main :: IO ()
main = isEOF >>= \b -> unless b $ foldr seq main . T.words =<< T.getLine

Python

import sys

for line in sys.stdin:
    words = line.split()

Haskell: 0.53 (slightly higher)
Python: 0.53 (slightly lower)

tomjaguarpaw · July 26, 2022, 8:57am

Python strings are immutable too.

fendor · July 26, 2022, 9:04am

I am not sure if we are benchmarking here the same things.

Running the benchmark tool hyperfine, I get the following:

$ ghc -O2 Words.hs
$ hyperfine "./Words < words.txt"
Benchmark 1: ./Words < words.txt
  Time (mean ± σ):     183.6 ms ±   2.2 ms    [User: 180.2 ms, System: 2.6 ms]
  Range (min … max):   181.3 ms … 189.5 ms    16 runs 
$ hyperfine "./words.py < words.txt"
Benchmark 1: ./words.py < words.txt
  Time (mean ± σ):      64.2 ms ±   1.7 ms    [User: 59.1 ms, System: 5.0 ms]
  Range (min … max):    61.5 ms …  69.2 ms    44 runs

Only a three time difference anymore. Maybe text < 2 spends a lot of time encoding and decoding from UTF-16? Changing it to use text-2.0 yields

$ cabal install --lib text --package-env=.  
$ ghc -O2 Words.hs
$ hyperfine "./Words < words.txt"
Benchmark 1: ./Words < words.txt
  Time (mean ± σ):     137.5 ms ±   2.1 ms    [User: 134.7 ms, System: 2.4 ms]
  Range (min … max):   133.9 ms … 142.3 ms    21 runs

gives us closer performance metrics already.

However, I agree that it seems like the toLower function is less efficient compared to python, removing the “toLower” from both yields:

$ hyperfine "./Words < words.txt"
Benchmark 1: ./Words < words.txt
  Time (mean ± σ):      50.2 ms ±   1.5 ms    [User: 47.8 ms, System: 2.3 ms]
  Range (min … max):    48.1 ms …  53.8 ms    59 runs
 
$ hyperfine "./words.py < words.txt"
Benchmark 1: ./words.py < words.txt
  Time (mean ± σ):      54.3 ms ±   3.2 ms    [User: 50.6 ms, System: 3.6 ms]
  Range (min … max):    50.9 ms …  70.9 ms    56 runs

Maybe just a matter of a missed optimisation opportunity somewhere?

Moreover, avoiding text altogether and using Bytestring without any decoding/encoding like this:

{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import System.IO
import Control.Monad

main :: IO ()
main = isEOF >>= \b -> unless b $
  foldr seq main . BS8.words . BS.map go =<< BS.getLine
  where
    go !n = case 65 <= n && n <= 90 of
        True -> n + 32
        _ -> n

yields better values:

$ hyperfine "./Words < words.txt"
Benchmark 1: ./Words < words.txt
  Time (mean ± σ):      49.3 ms ±   1.7 ms    [User: 46.4 ms, System: 2.8 ms]
  Range (min … max):    46.7 ms …  55.5 ms    59 runs

However, admittedly that’s now probably not fair to python again, since we are avoiding encodings altogether. (Also, note, I did not check whether I introduced a mistake here, please take it with a grain of salt.)

jaror · July 26, 2022, 9:12am

Thanks for pointing out hyperfine that’s a very cool tool. I get these results:

Benchmark 1: ./simple-hs <kjvbible_x10.txt
  Time (mean ± σ):      1.385 s ±  0.029 s    [User: 1.358 s, System: 0.023 s]
  Range (min … max):    1.370 s …  1.466 s    10 runs
 
Benchmark 1: python3 simple.py <kjvbible_x10.txt
  Time (mean ± σ):     582.7 ms ±   3.5 ms    [User: 569.1 ms, System: 11.3 ms]
  Range (min … max):   576.9 ms … 588.7 ms    10 runs

Seems more reasonable, but indeed still more than 2x slower.

jaror · July 26, 2022, 9:23am

I was benchmarking with the file repeated 10 times because time ... is not so accurate, it seems you are just benchmarking the file which is fine if you are using hyperfine. Here are my results with just that file once as input:

Benchmark 1: ./simple-hs <kjvbible.txt
  Time (mean ± σ):     156.7 ms ±   4.9 ms    [User: 141.2 ms, System: 6.9 ms]
  Range (min … max):   149.1 ms … 164.8 ms    19 runs
 
Benchmark 1: python3 simple.py <kjvbible.txt
  Time (mean ± σ):      81.0 ms ±   2.4 ms    [User: 73.5 ms, System: 5.4 ms]
  Range (min … max):    78.3 ms …  87.8 ms    36 runs

jumpnbrownweasel · July 29, 2022, 9:33pm

The doc for python lower() says it is only converting ascii uppercase bytes, which is very fast but won’t work for Unicode in general. Data.Text.toLower is attempting to process Unicode correctly, so that might explain the added overhead.

Changing case correctly in all languages is a lot more difficult and expensive that you might think.

jaror · July 29, 2022, 9:48pm

Oh, I should have said here too that the maintainers of the text package have very rapidly developed a fix that improves the performance of the Haskell function to match the speed of Python’s.