Fastest method to generate N random outcomes with given distribution


#1

I’d like to know if I’m doing this the most performant way. I’m pretty new to Haskell so I’m not very familiar with the ecosystem or performance considerations yet.

I need to generate some random data representing how a marketing campaign is doing in terms of how successfully the content is being delivered. For example, there are push notifications, email, etc. For each of these channels, a campaign could succeed or fail to be delivered. So for any given campaign, there will b N number emails sent, and some subset of them may fail. However, the randomness of success/failure is not evenly distributed. Each campaign has a predefined probability of experiencing failure, so that probability is factored into the random outcomes. The way I’m currently generating those outcomes is like this:

module Main where

import qualified Control.Monad.Random as CMR
import Data.Function ( (&) )

genFailures :: Int -> IO Int
genFailures numSent =
    -- True is Success; False is Failure
    CMR.weighted [ (True, 8 / 10), (False, 2 / 10) ]
    & CMR.replicateM numSent
    & fmap (length . filter (== False))

main :: IO ()
main =
    genFailures 1000 >>= print

When numSent is large, this is very slow and I’m wondering if there is perhaps a more performant way of achieving what I want. Any pointers would be much appreciated!


#2

If you only want the number of failures you could sample a binomial distribution. I don’t know the specifics, but Wikipedia has some info on it: https://en.wikipedia.org/wiki/Binomial_distribution#Generating_binomial_random_variates.


#3

A more immediate way to improve your code is to keep track of the number of failures while doing the random number generation:

{-# LANGUAGE BangPatterns #-}
module Main where

import qualified Control.Monad.Random as CMR

genFailures :: Int -> IO Int
genFailures n = genFailures' 0 n

genFailures' :: Int -> Int -> IO Int
genFailures' acc 0 = return acc
genFailures' !acc !n = do
  x <- CMR.weighted [ (True, 8 / 10), (False, 2 / 10) ]
  genFailures' (if x then acc else acc + 1) (n - 1)

main :: IO ()
main = genFailures 1000000 >>= print

For me this runs in 1.4 seconds.


#4

Just tried using mwc-probability (System.Random.MWC.Probability) for binomial sampling, but it appears to be even slower than my initial solution with MonadRandom (Control.Monad.Random). I’m testing both again 1_000_000 as input.

$ time stack exec monad-random-gen-failures-exe
> 1.83 real         2.28 user         0.31 sys
$ time stack exec mwc-prob-gen-failures-exe
> 1.97 real         5.95 user         1.38 sys

Looks like this:

module Main where

import qualified System.Random.MWC.Probability as Prob

binomialFailures :: Prob.GenIO -> Int -> IO Int
binomialFailures gen numSent =
    Prob.sample (Prob.binomial numSent (4 / 5)) gen

main :: IO ()
main = do
    gen <- Prob.createSystemRandom
    binomialFailures gen 1000000 
    >>= print

I figured it might be faster since MonadRandom is based on random (System.Random) and mwc-probability is based on mwc-random.


#5

@jaror thanks for your input. For some reason, when I tried your recursive version, it was slower than my original. My types are little different but I don’t see why it would affect the performance. Looks like this:

{-# LANGUAGE BangPatterns #-}

module Main where

import qualified Control.Monad.Random as CMR

genFailures :: Int -> CMR.Rand CMR.StdGen Int
genFailures =
    genFailures' 0

genFailures' :: Int -> Int -> CMR.Rand CMR.StdGen Int
genFailures' acc 0 =
    return acc
genFailures' !acc !n =
    do
        x <- CMR.weighted [ (True, 8 / 10), (False, 2 / 10) ]
        genFailures' (if x then acc else acc + 1) (n - 1)

main :: IO ()
main = CMR.evalRandIO (genFailures 1000000) >>= print

Here are the times I’m seeing:

$ time stack exec original-exe
> 1.81 real         2.35 user         0.34 sys
$ time stack exec recursive-exe
> 3.16 real         3.88 user         0.49 sys

#6

Yes, I noticed after writing it that my version was not that much more efficient than yours. I think that the Rand StdGen Int type and the evalRandIO are the culprits of the slowdown. But it is probably not efficient enough anyway.


#8

I’m reading the papers linked from the Wikipedia article. They use Rejection sampling.

If you can read java, I’m pretty sure this is an implementation of the algorithm in the paper: https://github.com/oracle/fastr/blob/fb6acbb04c8b1f50283df1a7f83a92613eb7b808/com.oracle.truffle.r.runtime/src/com/oracle/truffle/r/runtime/nmath/distr/Rbinom.java. Here is the original c code: https://github.com/SurajGupta/r-source/blob/master/src/nmath/rbinom.c.

This stackexchange answer provides a link to a book which contains an in-depth explanation of the algorithm: https://stats.stackexchange.com/questions/223702/writing-a-function-in-r-that-imitates-rbinom/223716#223716. The discussion about the binomial distribution starts at page 521 of Devroye’s book.


#9

mwc-random is also my go-to package if I need fast-ish random numbers. If we just use a naive tail-recursive version, we get:


module Main where

import           Control.Monad.Primitive (PrimMonad (..))
import qualified System.Random.MWC       as MWC

genFailures :: PrimMonad m => MWC.Gen (PrimState m) -> Int -> m Int
genFailures gen = go 0
  where
    go !acc n
        | n <= 0    = return acc
        | otherwise = do
            x <- MWC.uniformR (1, 10 :: Int) gen
            go (if x <= 8 then acc else acc + 1) (n - 1)

main :: IO ()
main = do
    gen <- MWC.createSystemRandom
    print =<< genFailures gen 1000000

Which gives us a decent speedup:

> time ./original
200431

real    0m2.254s
user    0m2.127s
sys     0m0.119s
> time ./mwc
200091

real    0m0.039s
user    0m0.034s
sys     0m0.005s

However, the code is very imperative! Two common problems with the solutions posted so far are:

  1. weighted seems slow and doesn’t get inlined properly.
  2. replicateM in IO is not lazy enough – it will generate the entire list before starting to count Falses.

We can write a more functional version using System.Random easily:

{-# LANGUAGE BangPatterns #-}

module Main where

import System.Random

genFailures :: Int -> StdGen -> Int
genFailures n = length . filter (> 8) . take n . randomRs (1, 10 :: Int)

main :: IO ()
main = do
    gen <- getStdGen
    print $ genFailures 1000000 gen

But it’s a bit slower again, due to the use of StdGen over mwc-random:

> time ./functional
199713

real    0m0.198s
user    0m0.191s
sys     0m0.007s

If you want to avoid the imperative style but still want the speed, you can probably use (unsafe) interleaving to generate a lazy list of random numbers using mwc-random.


#10

Thanks for your help @jaspervdj! Does your last example using System.Random account for my requirement that each campaign has a predefined probability of success that be used as a weight when generating the random outcomes? It doesn’t seem like it does, but I may just be missing something.

I’ve tried several different options and they’ve all been unacceptably slow for my purposes. I ended up using a hack to approximate a random sampling from a binomial distribution. The basic idea is to randomly add or subtract a random number that is between 1 and 3 standard deviations from the average using the 68-95-99.7 rule.

module Main where

import Data.Function ( (&) )
import qualified Control.Monad.Random as CMR
import qualified Statistics.Distribution as Stats
import qualified Statistics.Distribution.Binomial as Stats

main :: IO ()
main = CMR.evalRandIO (randErrCountFromTotal 1000000 0.5) >>= print

randErrCountFromTotal :: (CMR.RandomGen g) => Int -> Double -> CMR.Rand g Int
randErrCountFromTotal numEvts successRate = do
    -- add or subtract 1-3 std deviations from the average 
    -- using the 68–95–99.7 rule as a heuristic
    numStdDeviations <- CMR.fromList [ (1.0, 0.6827), (2.0, 0.2718), (3.0, 0.0455) ]
    let stdDeviation = Stats.stdDev $ Stats.binomial numEvts successRate
        deviationMin = round $ (numStdDeviations * stdDeviation) - stdDeviation
        deviationMax = round $ numStdDeviations * stdDeviation
        avgSuccessCount = round $ realToFrac numEvts * successRate

    CMR.fromList [ ((+), 1 / 2), ((-), 1 / 2) ]
        <*> pure avgSuccessCount
        <*> CMR.getRandomR (deviationMin, deviationMax)
         &  fmap (\successCount -> numEvts - successCount)

Not being a math person myself, I’m not sure if this is all that sound of an approach, but it seems to approximate the effect I want pretty well and it’s much faster than the alternatives I tried.

$ time stack exec hacky-exe
> 0.27 real         0.21 user         0.03 sys

#11

I think it really depends on what you will be using this for. It is pretty easy (i.e. not a lot of samples are required) to detect that your distribution is not actually a binomial distribution. But the samples are generally in the right place. It is like a really pixelated image of a nice continuous curve. It is like most things in statistics: it is not perfect, but it might be good enough for your purpose.

In this case I really think that it would be worthwhile to translate the algorithm that R uses to Haskell or maybe even just bind the C file. I have modified the rbinom.c file for easier binding here: https://gist.github.com/noughtmare/ff3705c0906506911266b45d03c11d20.


#12

I have now finished the translation from C to Haskell: https://gist.github.com/noughtmare/0a8830f1ef69db9521c210e3ef5c2c02

I have not tested it extensively, but it seems to work.

Now you can simulate running a marketing campaign on the whole population of earth multiple times over in 0 seconds, using main = print =<< runRandIO (rbinom 10000000000 0.2):

$ cabal new-run rbinom.hs 
Resolving dependencies...
Build profile: -w ghc-8.6.5 -O1
In order, the following will be built (use -v for more details):
 - fake-package-0 (exe:script) (first run)
Configuring executable 'script' for fake-package-0..
Preprocessing executable 'script' for fake-package-0..
Building executable 'script' for fake-package-0..
[1 of 1] Compiling Main             ( Main.hs, /tmp/cabal-repl.-2130/dist-newstyle/build/x86_64-linux/ghc-8.6.5/fake-package-0/x/script/build/script/script-tmp/Main.o )
Linking /tmp/cabal-repl.-2130/dist-newstyle/build/x86_64-linux/ghc-8.6.5/fake-package-0/x/script/build/script/script ...
1999924842
          93,344 bytes allocated in the heap
           4,376 bytes copied during GC
          53,688 bytes maximum residency (1 sample(s))
          28,232 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.000s  (  0.000s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.000s  (  0.000s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    462,099,009 bytes per MUT second

  Productivity  50.6% of total user, 51.6% of total elapsed

EDIT: I have tested it some more and it seems that it deviates to smaller values.

EDIT2: I have fixed the bugs.