Optimiser performance problems

tomjaguarpaw · February 28, 2024, 12:49pm

OK, I now understand what’s going on.

Right, and the reason the problem doesn’t appear on your original example but it does in a higher-dimensional version is because of @rhendric’s observation:

The space leak occurs if the original example is changed from V.fromList [-2000] to V.fromList [-2000, -2000] (so the second coordinate can leak), plus abs x < 1e-8 to abs x < 0 (so that it runs long enough to see the leak).

Yes, make invalid laziness unrepresentable. I disagree with others who are saying that there are too many methods, that you should sprinkle bang patterns, or that you should use force. There is no need to read a book, there is no need to apply the blunt hammer of deepseq, you just need to choose the correct data structure. Here is your original code, fixed by simply choosing the correct data structure:

{-# LANGUAGE GHC2021 #-}

import Numeric.AD (grad)
import qualified Data.Vector as V
import qualified Data.Vector.Generic as G
import qualified Data.Strict.Vector as SV

-- With lazy V.Vector it leaks, with strict SV.Vector it does not
-- leak.
type AdaVector = SV.Vector
-- type AdaVector = V.Vector

adaMax f' start alpha beta1 beta2 = adaMax' start 0 zeros zeros
  where
    zeros = G.fromList (replicate n 0)
    n = length start
    adaMax' th t m u
      | all (\x-> abs x < 0) g = th'
      | otherwise    = adaMax' th' (t+1) m' u'
      where g   = f' th
            m'  = G.map (\(m,g)->beta1 * m + (1-beta1) * g) (G.zip m g)
            u'  = G.map (\(u,g)->max (beta2*u) (abs g)) (G.zip u g)
            th' = G.map (\(th,m,u)->th - (alpha / (1-beta1^(t+1))) * m/u) $ G.zip3 th m' u'

fTest  = grad (\xs -> (G.head xs)^2 - (G.head xs))

test1 = adaMax fTest (G.fromList @AdaVector [-2000, -2000]) 0.002 0.9 0.999

main = test1 `seq` pure ()

All I have done is generalized your code to the Data.Vector.Generic interface so we can compare what happens when we use Data.Vector to what happens when we use Data.Strict.Vector (from strict-containers). With Data.Vector.Strict there is no space leak. No need for !, seq, force, or any form of careful analysis. That is the benefit of making invalid states unrepresentable through a powerful type system!

In practice you probably wouldn’t even bother with this generalization, you’d just write your code against Data.Strict.Vector directly.

Why is this the correct way to write the code? Because in your particular use case there is no purpose to being able to store unevaluated thunks as vector elements. Therefore, let’s statically ensure that the vector cannot possibly leak space by choosing the correct vector data type. (I would actually guess that in 99% of practical use cases there’s no need to store unevaluated thunks either.)