One Billion Row challenge in Hs

I don’t know I f I am missing something, but doesn’t your code parse the map twice??

EDIT: I just realise that when debug is False then parse doens’t evaluate… Isn’t it?

Since we are Haskelling, it feels (naively) like we should be able to:

  1. Prove that the optimal program is possible with a particular parallelisation scheme. E.g. “optimal results are possible with map/reduce therefore the mapper/reducer requiring fewest iterations will result in the optimal program”. Write a generic map/reduce wrapper.

  2. Focus on the “inner loop”. E.g. that would be the mapper and reducer functions in the map/reduce scheme. Note its all single core at this point.

  3. Measure the number of “steps” the “inner loop” takes irrespective of CPU.

  4. Compare solutions with each other purely in terms of “inner loop” steps.

Something like the above, or inspired by the above, strikes me as a “Haskell enabled” approach.

Because the main feature of such “challenges” is to be able to compare different programming languages, so having well defined input and output files is the most important prerequisite.

For example, Bodigrims Haskell solution prints negative zeros (-0.0) as mean values (the most common error). His (parallel) solution is about 10% (7s) faster - 64s - as a single threaded Go solution on my computer which uses a map of indices into a record of arrays, for comparison. His version using a single thread takes about 315s, which is about 95s (43%) more than the Java reference implementation needs (again, on my computer).

Coincidentally I started doing the 1BRC yesterday using Go (as I haven’t used that much so far). The repo could help getting the data and reference solution as easy as possible:

1 Like

I guess if that’s what you want, but personally I’m not wondering whether Haskell will beat C, Rust, Fortran, etc, I know the answer to that :slight_smile: For me – as a noob – the main draw here is how to make things go fast in Haskell relative to Haskell, and how much slower it is when I do the naive thing.

1 Like

Oh, I didn’t want to imply that concentrating on a single language is somehow bad (the “original” 1BRC has been Java only), I just wanted to express that “portable” solutions are preferable if they aren’t too much hassle.

You have one mutable data structure, and you’re updating it atomically, Your code is wasting a lot of time thrashing that IORef, your code is probably not even truly multi threaded due to the memory barrier used to enforce atomicity, I would want to see a 100% pure non-threaded implementation using just unionWith as a reference point. I suspect your code is actually slower than a single core version. the 1TB heap allocation looks kinda sus, especially trying to sync that across multiple cores.

I have some ideas for a multicore version, but I need to sleep.

1 Like

This is just my opinion, but I’d personally like to see the best possible solution in < 30 LOC. From my noob perspective, I don’t see the point of writing Haskell unless its going to be elegant. After all, I can fiddle bits in Fortran at a cost to lines of code, with much more bang for the buck.

I want to see this, I want to see that … how about we see some code instead :wink:

8 Likes

Challenge accepted :slight_smile: Here is a quicky: 12 lines of code.

import qualified Data.Map.Strict as M
import Data.List(foldl')

main :: IO ()
main = do
  str <- readFile "sample.txt"
  let ls = map ((\(a,_:b)-> (a,read b :: Float)) . break (== ';')) (lines str)
      f (a,b,c,d) (a',b',c',d') = (min a a', max b b', c+c', d+d')
      collect = foldl' (\m (k,v) -> M.insertWith f k (v,v,v,1) m) M.empty ls
  mapM_ (putStrLn . show) $ M.mapWithKey (\k (a,b,c,d)-> (k,a,b,c/d)) collect

EDIT: Now 10 LOC and no regex dependency. Uses break function instead of regex as provided by @jaror.

EDIT 2: More @jaror goodness added: 9 LOC!

2 Likes

I think you can save an import and shrink this by using break:

  let ls = map ((\(a,_:b)-> (a,read b :: Float)) . break (== ';')) (lines str)

(untested)

Edit: Oh, and you don’t need the SSS any more either.

1 Like

Its so damn pretty @jaror … 10 LOC.

How about:

      f (a,b,c,d) (a',b',c',d') = (min a a', max b b', c+c', d+d')
      collect = foldl' (\m (k,v) -> M.insertWith f k (v,v,v,1) m) M.empty ls
1 Like

Lovely, I’ve learned a new function: M.insertWith :slight_smile: 9 LOC now!

Admittedly I’m no expert in multithreading, but data does not seem to support your claims.

A pure non-threaded implementation is right there in parse and it is several times slower than the multi-threaded one. atomicModifyIORef' happens once per 1M of input, so I doubt it to be a bottleneck.

2 Likes

I had a quick tinker. Here is a corrected version: still 9 LOC but avoids accumulating thunks. Takes about 30 sec for 10M rows, would take about 50 minutes for 1B :slight_smile: It uses very little memory, and its running on my puny laptop from an SSD.

import qualified Data.HashMap.Strict as M

main :: IO ()
main = do
  str <- getContents --readFile "measurements.txt"
  let ls = map ((\(a,_:b)-> (a,read b :: Float)) . break (== ';')) (lines str)
      f (a,b,c,d) (a',b',c',d') = let (i,j,k,l) = (min a a', max b b', c+c', d+d') in
        i `seq` j `seq` k `seq` l `seq` (i,j,k,l)
      collect = foldl (\ m (k,v) -> M.insertWith f k (v,v,v,1) m) M.empty ls
  mapM_ (putStrLn . show) $ M.mapWithKey (\k (a,b,c,d)-> (k,a,b,c/d)) collect

You can get an upper bound of its parallel run time like this:

time cat input.txt | parallel --pipe -N 100000 --blocksize 1800000 ‘./1brc’

EDIT: Just switching to HashMap improves performance to 25s: so about 17% faster.

2 Likes

In case anyone is interested here is a quick AWK version (11 LOC). Runs in 3.6 sec for 10M rows (so about 6 minutes for 1B) on a single thread. I suspect on 8 physical cores + GNU parallel it could be < 1 min.

#!/usr/bin/mawk -f

BEGIN { FS = ";"; }

{ k = $1; v = $2 + 0;

  if (!(k in min)) {
      min[k]=v; max[k]=v; sum[k]=v; count[k]=1;
  } else {
      if (v < min[k]) min[k] = v;
      if (v > max[k]) max[k] = v;
      sum[k] += v; count[k]++; } }

END { for (k in min) {
        avg = sum[k] / count[k];
        printf "%s;%.1f;%.1f;%.1f\n", k, min[k], max[k], avg; } }
2 Likes

You are not sorting the names when printing the results.

I made almost the same GAWK script

which does take about 10 minutes on my computer.

Trying running yours with mawk if you haven’t already – it may make a difference. I like your Go write up by the way: very nice :slight_smile:

Indeed, but I don’t mind. It won’t affect the run time much.

Maybe you can speed it up with bytestrings:

import qualified Data.Map.Strict as M
import qualified Data.ByteString.Lazy.Char8 as B

main :: IO ()
main = do
  str <- B.getContents --readFile "measurements.txt"
  let ls = map ((\(a,b)-> (a,read (B.unpack (B.tail b)) :: Float)) . B.break (== ';')) (B.lines str)
      f (a,b,c,d) (a',b',c',d') = let (i,j,k,l) = (min a a', max b b', c+c', d+d') in
        i `seq` j `seq` k `seq` l `seq` (i,j,k,l)
      collect = foldl (\ m (k,v) -> M.insertWith f (B.toStrict k) (v,v,v,1) m) M.empty ls
  mapM_ (putStrLn . show) $ M.mapWithKey (\k (a,b,c,d)-> (B.unpack (B.fromStrict k),a,b,c/d)) collect
2 Likes

Wow! Down from 25s to 15s (for 10M rows) using HashMap and ByteString at 10 lines of code:

import qualified Data.HashMap.Strict as M
import qualified Data.ByteString.Lazy.Char8 as B

main :: IO ()
main = do
  str <- B.getContents --readFile "measurements.txt"
  let ls = map ((\(a,b)-> (a,read (B.unpack (B.tail b)) :: Float)) . B.break (== ';')) (B.lines str)
      f (a,b,c,d) (a',b',c',d') = let (i,j,k,l) = (min a a', max b b', c+c', d+d') in
        i `seq` j `seq` k `seq` l `seq` (i,j,k,l)
      collect = foldl (\ m (k,v) -> M.insertWith f (B.toStrict k) (v,v,v,1) m) M.empty ls
  mapM_ (putStrLn . show) $ M.mapWithKey (\k (a,b,c,d)-> (B.unpack (B.fromStrict k),a,b,c/d)) collect

2 Likes