How lazy is this Kattis solution (Reseto)?

Yesterday, I solved the Reseto problem over at Kattis. Compared to the few cpp solutions I looked at after, my solution seemed pretty compact!

But I’m wondering how lazy it is? I suppose

  • the no-vector from (yes, no) won’t be evaluated unless the then-branch is reached
  • the yes ! k means that the elements beyond index k won’t be evaluated

The last assumption doesn’t seem plausible, as the elements past k would have had to be evaluated in order to even be grouped in yes.

What do you think? Also, have you solved it yourself and would like to show your solution?

{-# LANGUAGE TypeApplications #-}

module Reseto where

import Data.Vector (Vector, (!))
import qualified Data.Vector as V

main :: IO ()
main = interact (show . solve . map read . words)

solve :: [Int] -> Int
solve [n, k] = cross (V.enumFromTo @Int 2 n) (k - 1)

cross :: Vector Int -> Int -> Int
cross xs k =
  let (yes, no) = V.partition ((== 0) . flip mod (V.head xs)) xs
      len = V.length yes
   in if k > len - 1
        then cross no (k - len)
        else yes ! k

EDIT:

It hit me that I could use ghci’s :sprint to at least check the first bullet:

ghci> (yes, no) = V.partition ((== 0) . flip mod (V.head xs)) xs
ghci> :sp yes
yes = _
ghci> :sp no
no = _

And this surprised me:

ghci> ys = yes V.! 2
ghci> ys
6
ghci> :sp yes
yes = _

I believe both yes and no are always evaluated here. To be able to get the length of yes, you’re going to need to actually perform the V.partition operation, and that will leave you with two fully evaluated vectors. The items in those vectors will only be evaluated as far as the function passed to partition needed them to be evaluated, but in this case, that means they’re fully evaluated.

If you’re curious as to how I’d solve it (just the pure part), probably not much differently than you.

import Data.List (partition, unfoldr)

reseto :: Int -> Int -> Int
reseto n k = concat (unfoldr step [2..n]) !! (k-1)
  where
    step [] = Nothing
    step xs@(x:_) = Just $ partition ((== 0) . (`mod` x)) xs

The nature of this problem involves walking the entire list multiple times, so I don’t think there’s a huge benefit to be had by using a vector, unless you’re mutating that vector in place, so I’ve stuck with lists.

This sounds a bit strange to me, vectors will take up less memory and will be contiguous, which makes it cheaper to walk them compared to lists. Especially if you can use unboxed vectors.

unfoldr strikes again! Cool!

My solution used List at first, but it didn’t pass the time limit. Switching to Vector did make it pass!

I think the partitioning and indexing was the culprit in the List implementation.

This problem does give very interesting benchmark results. As long as K is less than about 6% of N the lazy version is faster (the lazy version is basically instant if K=1), but otherwise the strict version is faster (up to about 10x faster it seems):

  list 100000 99999:   OK
    973  ms ±  20 ms, 5.5 GB allocated, 218 MB copied,  14 MB peak memory
  list 100000 6000:    OK
    278  μs ± 8.1 μs, 2.6 MB allocated, 152 KB copied,  14 MB peak memory
  list 100000 1000:    OK
    37.6 μs ± 2.2 μs, 445 KB allocated, 3.9 KB copied,  14 MB peak memory
  list 100000 1:       OK
    11.3 ns ± 786 ps, 239 B  allocated,   0 B  copied,  14 MB peak memory
  vector 100000 99999: OK
    99.2 ms ± 4.5 ms, 354 MB allocated,  24 KB copied,  14 MB peak memory
  vector 100000 6000:  OK
    232  μs ±  15 μs, 1.5 MB allocated,  95 B  copied,  17 MB peak memory
  vector 100000 1000:  OK
    230  μs ±  16 μs, 1.5 MB allocated,  66 B  copied,  18 MB peak memory
  vector 100000 1:     OK
    231  μs ±  13 μs, 1.5 MB allocated,  85 B  copied,  18 MB peak memory

And interestingly, -fllvm is able to do a better job with the unboxed vectors:
  list 100000 99999:   OK
    931  ms ±  48 ms, 5.5 GB allocated, 218 MB copied,  14 MB peak memory
  list 100000 6000:    OK
    267  μs ±  23 μs, 2.6 MB allocated, 151 KB copied,  14 MB peak memory
  list 100000 1000:    OK
    34.6 μs ± 2.7 μs, 443 KB allocated, 3.8 KB copied,  14 MB peak memory
  list 100000 1:       OK
    9.56 ns ± 660 ps, 239 B  allocated,   0 B  copied,  14 MB peak memory
  vector 100000 99999: OK
    82.6 ms ± 4.8 ms, 354 MB allocated,  24 KB copied,  14 MB peak memory
  vector 100000 6000:  OK
    183  μs ±  17 μs, 1.5 MB allocated,  74 B  copied,  17 MB peak memory
  vector 100000 1000:  OK
    181  μs ±  16 μs, 1.5 MB allocated,  67 B  copied,  17 MB peak memory
  vector 100000 1:     OK
    181  μs ±  14 μs, 1.5 MB allocated,  64 B  copied,  18 MB peak memory

Full benchmark code:
import Data.Vector.Unboxed qualified as VU
import Data.List (partition, unfoldr)
import Test.Tasty.Bench

reseto :: Int -> Int -> Int
reseto n k = concat (unfoldr step [2..n]) !! (k-1)
  where
    step [] = Nothing
    step xs@(x:_) = Just $ partition ((== 0) . (`mod` x)) xs

reseto' :: Int -> Int -> Int
reseto' n k = VU.concat (unfoldr step (VU.enumFromTo 2 n)) VU.! (k-1)
  where
    step xs 
      | VU.null xs = Nothing
      | otherwise = Just $ VU.partition ((== 0) . (`mod` VU.head xs)) xs

main = defaultMain
  [ bench "list 100000 99999"   $ nf (uncurry reseto)  (100000, 99999)
  , bench "list 100000 6000"    $ nf (uncurry reseto)  (100000, 6000)
  , bench "list 100000 1000"    $ nf (uncurry reseto)  (100000, 1000)
  , bench "list 100000 1"       $ nf (uncurry reseto)  (100000, 1)
  , bench "vector 100000 99999" $ nf (uncurry reseto') (100000, 99999)
  , bench "vector 100000 6000"  $ nf (uncurry reseto') (100000, 6000)
  , bench "vector 100000 1000"  $ nf (uncurry reseto') (100000, 1000)
  , bench "vector 100000 1"     $ nf (uncurry reseto') (100000, 1)
  ]

Edit: We can improve the performance a bit using rem instead of mod:

  vector 100000 99999: OK
    86.6 ms ± 4.2 ms, 353 MB allocated,  20 KB copied,  14 MB peak memory
  vector 100000 6000:  OK
    198  μs ±  16 μs, 1.5 MB allocated,  71 B  copied,  19 MB peak memory
  vector 100000 1000:  OK
    195  μs ±  18 μs, 1.5 MB allocated,  67 B  copied,  20 MB peak memory
  vector 100000 1:     OK
    195  μs ±  17 μs, 1.5 MB allocated,  64 B  copied,  20 MB peak memory
1 Like

Super cool! I forgot rem was faster, too.

I really like the unfoldr solution. I thought I had gotten to the essence of the problem with my solution, but looking at it again:

solve [n, k] = cross (V.enumFromTo @Int 2 n) (k - 1)

it’s quite plain to see that I am, sort of, indexing something by the look of (k - 1) as a last parameter. I really need to grok unfoldr before I do more problems.

My statement does seem a bit off for this problem. It makes some sense if the function passed to partition dominates compute time, which mod certainly doesn’t.