Cryptonite - seeming thunk leak? how to proceed/debug

So, when running cryptonite’s test suite with -KmK (Neil Mitchell has a blogpost!) I get:

cabal run test-cryptonite -- -p ECDSA +RTS -K32K
cryptonite
  PubKey
    ECDSA
      SHA1
        signature
          0: OK
          1: OK
          2: OK
          3: OK (0.01s)
          4: OK (0.01s)
        verify
          0: OK
          1: OK (0.02s)
          2: OK (0.02s)
          3: OK (0.02s)
          4: OK (0.02s)
      SHA224
        signature
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: OK
          9: OK
        verify
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA224.verify.8/' to rerun this test only.
          9: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA224.verify.9/' to rerun this test only.
      SHA256
        signature
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: OK
          9: OK
        verify
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA256.verify.8/' to rerun this test only.
          9: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA256.verify.9/' to rerun this test only.
      SHA384
        signature
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: OK
          9: OK
        verify
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA384.verify.8/' to rerun this test only.
          9: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA384.verify.9/' to rerun this test only.
      SHA512
        signature
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: OK
          9: OK
        verify
          0: OK
          1: OK
          2: OK
          3: OK
          4: OK
          5: OK
          6: OK
          7: OK
          8: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA512.verify.8/' to rerun this test only.
          9: FAIL
            Exception: stack overflow
            Use -p '/ECDSA/&&/SHA512.verify.9/' to rerun this test only.
  ECDSA
    SHA1:    FAIL
      Exception: stack overflow
      Use -p '/ECDSA/&&/cryptonite.ECDSA.SHA1/' to rerun this test only.
    SHA224:  FAIL
      Exception: stack overflow
      Use -p '/ECDSA/&&/cryptonite.ECDSA.SHA224/' to rerun this test only.
    SHA256:  FAIL
      Exception: stack overflow
      Use -p '/ECDSA/&&/cryptonite.ECDSA.SHA256/' to rerun this test only.
    SHA384:  FAIL
      Exception: stack overflow
      Use -p '/ECDSA/&&/cryptonite.ECDSA.SHA384/' to rerun this test only.
    SHA512:  FAIL
      Exception: stack overflow
      Use -p '/ECDSA/&&/cryptonite.ECDSA.SHA512/' to rerun this test only.

13 out of 95 tests failed (0.43s)

(basically, stack overflows in ECDSA stuff). Here is the stack trace:

      Exception: stack overflow
      Use -p '/ECDSA/&&/cryptonite.ECDSA.SHA384/' to rerun this test only.
    SHA512:  *** Exception (reporting due to +RTS -xc): (IND_STATIC), stack trace:
  Crypto.ECC.Simple.Prim.pointAddTwoMuls,
  called from Crypto.PubKey.ECDSA.verifyDigest,
  called from Crypto.PubKey.ECDSA.verify,

(i.e. maybe pointAddTwoMuls, otherwise verifyDigest)

Looking at the heap profile, it looks like a large share of the space is dedicated to thunks! Which makes it more suspect!

How do I confirm this is a real thunk leak? Is this a bug?

1 Like

Well, this looks highly suspect [EDIT: turns out this is not the cause of the problem]

propertyHold :: [PropertyTest] -> Bool
propertyHold l =
    case foldl runProperty [] l of
        []     -> True
        failed -> error (intercalate "\n" failed)
  where
    runProperty acc (EqTest name a b)
        | a == b    = acc
        | otherwise =
            (name ++ ": expected " ++ show a ++ " but got: " ++ show b) : acc

https://github.com/haskell-crypto/cryptonite/blob/309abe378da29bb0d840d1c76fd7cde6424f9968/tests/Utils.hs#L141-L150

but I can’t actually build the tests to check:

cabal run test-cryptonite -- -p ECDSA +RTS -K32K
_______________________________________________________________________________________________________Thu 28 Oct 2021 07:43:52 PM BST
Build profile: -w ghc-8.10.7 -O1
In order, the following will be built (use -v for more details):
 - cryptonite-0.29 (test:test-cryptonite) (first run)
Preprocessing test suite 'test-cryptonite' for cryptonite-0.29..
Building test suite 'test-cryptonite' for cryptonite-0.29..
Linking /home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/t/test-cryptonite/build/test-cryptonite/test-cryptonite ...
/home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/build/libHScryptonite-0.29-inplace.a(cryptonite_blake2s.o):cryptonite_blake2s.c:function cryptonite_blake2s_init: error: undefined reference to '_cryptonite_blake2s_init'
/home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/build/libHScryptonite-0.29-inplace.a(cryptonite_blake2s.o):cryptonite_blake2s.c:function cryptonite_blake2s_finalize: error: undefined reference to '_cryptonite_blake2s_final'
/home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/build/libHScryptonite-0.29-inplace.a(blake2sp.o):blake2sp.c:function blake2sp_init_root: error: undefined reference to '_cryptonite_blake2s_init_param'
/home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/build/libHScryptonite-0.29-inplace.a(blake2sp.o):blake2sp.c:function blake2sp_init_leaf: error: undefined reference to '_cryptonite_blake2s_init_param'
/home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/build/libHScryptonite-0.29-inplace.a(blake2sp.o):blake2sp.c:function _cryptonite_blake2sp_final: error: undefined reference to '_cryptonite_blake2s_final'
/home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/build/libHScryptonite-0.29-inplace.a(blake2sp.o):blake2sp.c:function _cryptonite_blake2sp_final: error: undefined reference to '_cryptonite_blake2s_final'
/home/tom/Haskell/cryptonite/dist-newstyle/build/x86_64-linux/ghc-8.10.7/cryptonite-0.29/build/libHScryptonite-0.29-inplace.a(blake2sp.o):blake2sp.c:function _cryptonite_blake2sp: error: undefined reference to '_cryptonite_blake2s_final'
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)

(I can, however, build 955f94b784c8a34c834d5b76d9116d0dab19f6bf and I see the same stack overflows)

In conclusion, this isn’t a space leak. The cause of the stack overflow is a recursive function that isn’t tail recursive. There’s no strong reason it should be tail recursive, I don’t think. The following avoids overflowing the stack (at the cost of building a stack on the heap!)

index d87672b..1fc1f6d 100644
--- a/Crypto/PubKey/ECC/Prim.hs
+++ b/Crypto/PubKey/ECC/Prim.hs
@@ -13,6 +13,7 @@ module Crypto.PubKey.ECC.Prim
     , isPointValid
     ) where
 
+import Data.List (foldl')
 import Data.Maybe
 import Crypto.Number.ModArithmetic
 import Crypto.Number.F2m
@@ -123,19 +124,21 @@ pointAddTwoMuls c n1 p1     _  PointO = pointMul c n1 p1
 pointAddTwoMuls c n1 p1     n2 p2
     | n1 < 0    = pointAddTwoMuls c (-n1) (pointNegate c p1) n2 p2
     | n2 < 0    = pointAddTwoMuls c n1 p1 (-n2) (pointNegate c p2)
-    | otherwise = go (n1, n2)
+    | otherwise = foldl' (\point app -> case app of
+                             Nothing -> pointDouble c point
+                             Just p -> pointAdd c p (pointDouble c point)
+                             ) PointO (go [] (n1, n2))
 
   where
     p0 = pointAdd c p1 p2
 
-    go (0,  0 ) = PointO
-    go (k1, k2) =
-        let q = pointDouble c $ go (k1 `div` 2, k2 `div` 2)
-        in case (odd k1, odd k2) of
-            (True  , True  ) -> pointAdd c p0 q
-            (True  , False ) -> pointAdd c p1 q
-            (False , True  ) -> pointAdd c p2 q
-            (False , False ) -> q
+    go l (0,  0 ) = l
+    go l (k1, k2) =
+      case (odd k1, odd k2) of
+            (True  , True  ) -> go (Just p0:l) (k1 `div` 2, k2 `div` 2)
+            (True  , False ) -> go (Just p1:l) (k1 `div` 2, k2 `div` 2)
+            (False , True  ) -> go (Just p2:l) (k1 `div` 2, k2 `div` 2)
+            (False , False ) -> go (Nothing:l) (k1 `div` 2, k2 `div` 2)
1 Like