How does recursion in a do block work?


Here’s a simple program I wrote to test out if Haskell’s recursion within a do-block will stack overflow:

count :: Int -> IO Int
count it
 | it <= 0   = pure 0
 | otherwise = 
    n <- count (it - 1)
    pure (n + 1)

surprisingly it doesn’t seem to stack overflow even for largish numbers like 1000000.

The equivalent Scala code in a for-comprehension:

import cats.effect.IO

val count: Int => IO[Int] = {
  case 0 => IO.pure(0)
  case it =>
    for {
     n <- count (it - 1)
    } yield (n + 1)


stack overflows at numbers around 10,000.

My question is about how Haskell implements this recursion in a do block such that it doesn’t stack overflow. I presume it’s related to thunks, beyond that I am not sure how it works. Any help would be appreciated. I looked around Google but couldn’t find anything related… maybe I’m looking for the wrong keywords.



It does overflow. Compiling your code with:

ghc -Wall -O Test.hs -rtsopts

You can check that it overflows by changing the stack size: ./Test +RTS -K8M

If you dump the Cmm intermediate representations with:

ghc -Wall -O Test.hs -fforce-recomp -ddump-cmm -dsuppress-all

You can see that 8-bytes are used to store the return address on the stack:

 $wcount_entry() //  [R2]
         { info_tbls: [(c4ep,
                        label: block_c4ep_info
                        rep: StackRep []
                        srt: Nothing),
                        label: $wcount_info
                        rep: HeapRep static { Fun {arity: 2 fun_type: ArgSpec 4} }
                        srt: Nothing)]
           stack_info: arg_space: 8 updfr_space: Just 8
           if ((Sp + -8) < SpLim) (likely: False) goto c4eO; else goto c4eP; --stack allocation
           R2 = R2;
           R1 = $wcount_closure;
           call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
           if (%MO_S_Gt_W64(R2, 0)) goto c4eL; else goto c4eM;
           I64[Sp - 8] = c4ep; -- store the return address for the recursive call
           R2 = R2 - 1;
           Sp = Sp - 8;
           call $wcount_info(R2) returns to c4ep, args: 8, res: 8, upd: 8;
           Hp = Hp + 24;
           if (Hp > HpLim) (likely: False) goto c4eS; else goto c4eR;
           HpAlloc = 24;
           R1 = R1;
           call stg_gc_unpt_r1(R1) returns to c4ep, args: 8, res: 8, upd: 8;
           I64[Hp - 16] = sat_s4e3_info;
           P64[Hp] = R1;
           R1 = Hp - 16;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
           R1 = lvl_r4ab_closure+1;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;

Thanks. Yes, for very large numbers it overflows. I was just wondering how it allows for such large numbers given that the Scala version can only support much smaller numbers.

I think, based on the previous comment, it’s related to GHC’s infinite stack.