State monad - memory exhausted

ajbarber · January 19, 2024, 8:46pm

Doing Advent of Code this year I experienced memory exhaustion on my machine running some code where the accumulated output of a Writer monad appears out of scope, yet does not appear to be freed. [ edit - this hypothesis is incorrect, read replies below ]

A simple reproduction of the issue is as follows, bearing in mind this code does not make sense other than to demonstrate the memory usage issue I experienced:

import Control.Monad.State.Strict
import Control.Monad.Trans.Writer.CPS
...

-- Enumerates attempts at brute forcing the answer. 
-- The output of the Writer is not used here. 
usesUpMemory :: State String Int
usesUpMemory = loopM (\i -> do
                 (a,_) <- runWriterT test
                 pure $ if a then Right i
                        else Left (i+1)) 1

test :: WriterT [Bool] (State String) Bool
test = do
  -- real version does some logic here and returns a True when conditions
  -- are correct, otherwise False. It also accumulates some characteristics of the solution attempt into the Writer monad.
  tell (replicate 1000 True)
  pure False

Question is, why does usesUpMemory behave in this way, shouldn’t it just continue forever using a constant amount of memory, or have I missed something here? Compiled with ghc -O3

tomjaguarpaw · January 19, 2024, 8:55pm

What happens if you change it to i `seq` Left (i+1)?

ajbarber · January 19, 2024, 9:23pm

Thanks, the thunks of unevaluated Left building up I’m presuming, I did read about this once. Are there some ground rules for avoiding this situtation? Perhaps hobbyists like me ought to just turn off laziness entirely with that extension…

Bodigrim · January 19, 2024, 9:28pm

I largely recommend avoiding any flavours of Writer altogether. Use Control.Monad.State.Strict (as you do) with a record with strict fields as a state.

lorenzo · January 19, 2024, 9:29pm

A good rule of thumb is that, whenever you are accumulating something, use your own data type in combination with the StrictData extension. I’ve been yet to experience a space leak after following this simple rule.

tomjaguarpaw · January 19, 2024, 9:36pm

Specifically, the 1 + 1 + 1 + ... + 1 thunk that you return inside the Left.

Are there some ground rules for avoiding this situtation?

I’ve written an article about this: Make Invalid Laziness Unrepresentable. Its advice is also in line with what @lorenzo is suggesting. However, that won’t help you here, because loopM is just a risky function. You just have to know how to use it correctly, by always forcing the payload to the Left (or the argument). That is, it has a dangerous sharp edge, just like foldl (i.e. the lazy version, to which foldl' is the strict analogue) and Control.Monad.Trans.State.Strict.modify (as opposed the strict version, modify').

There should probably be a strict version loopM' which forces the state every iteration.

Perhaps hobbyists like me ought to just turn off laziness entirely with that extension…

Strict would have worked here, but that is a very blunt hammer …

jaror · January 19, 2024, 10:43pm

FYI, GHC -O2 is the maximum optimization level. All higher numbers are currently treated the same as -O2.

I’d write that as Left $ i + 1.

Also, I’d write:

  pure $! if a then Right i else Left $! i + 1

That makes the Core look better (Edit: only for -O1. -O2 does get the good core without that extra $!), but I don’t know if that was actually causing a leak.

Ambrose · January 19, 2024, 10:57pm

I thought CPS Writer basically was State under the hood?

jaror · January 19, 2024, 11:12pm

If loopM would be defined like this:

loopM :: Monad m => (t -> m (Either t b)) -> t -> m b
loopM act = go where
  go x = do
    res <- act x
    case res of
        Left x -> go x
        Right v -> pure v

Then you’d need no extra strictness annotations at all. And this improves the performance further by allowing the Int to be unboxed, so no allocations happen during the loop (except for the allocations in test). And even crazier, if you don’t add a NOINLINE test pragma then GHC sees that there is an infinite loop and removes the branching altogether.

atravers · January 20, 2024, 2:34am

You mean there’s a substantial difference between using a local definition (go) and how loopM is currently defined:

loopM act x = do
    res <- act x
    case res of
        Left x -> loopM act x
        Right v -> pure v    
https://hackage.haskell.org/package/extra-1.7.14/docs/src/Control.Monad.Extra.html#loopM

…or are you referring to some other definition?

tomjaguarpaw · January 20, 2024, 8:00am

Ah yes, that would be better.

Also, I’d write:

 pure $! if a then Right i else Left $! i + 1

Indeed, as you found, that will make no difference. loopM always forces the Either to know whether it should go for another iteration, so there’s no benefit in forcing it before return.

If loopM would be defined like this …Then you’d need no extra strictness annotations at all.

What forces the argument to act? The following doesn’t crash under either version.

loopM (\_ -> pure (Left undefined)) ()

Do you mean that from strictness analysis (and knowledge of the definition of +) GHC can infer that the argument to Left would always be better off evaluated (and can never crash when evaluated), so it specialises it for that case?

Tangentially, I never understood why GHC can’t rewrite functions to go form by itself.

ajbarber · January 20, 2024, 10:23am

Is it statically known that thunks can build up infinitely in a given block? A compiler warning would have a huge impact would it not, you just find it and fix it, by ! forcing or using a strict record field etc. as suggested here.

jaror · January 20, 2024, 11:17am

It doesn’t make a difference for the leak, but it does make a difference in terms of total allocations. If you use $! in both places then you get this STG (this is all with -O1):

usesUpMemory2
  :: Int -> [Char] -> Identity (Either Int Int, [Char]) =
    \r [i eta]
        case usesUpMemory_m1 eta of { -- m1 is essentially 'test' in the source
        (,) a1 s' ->
        case a1 of {
        (,) a _ ->
        case a of {
          False ->
              case i of {
              I# x ->
              case +# [x 1#] of sat {
              __DEFAULT ->
              let { sat :: Int = I#! [sat]; } in
              let { sat :: Either Int Int = Left! [sat]; } in  (,) [sat s'];
              };
              };
          True ->
              let { sat :: Either Int Int = Right! [i]; } in  (,) [sat s'];
        };};};

This is nicely flat and only has two allocations for the I# and Left constructors. But if you leave out that outermost $! then you get:

usesUpMemory2
  :: Int -> [Char] -> Identity (Either Int Int, [Char]) =
    \r [i eta]
        case usesUpMemory_m1 eta of {
        (,) a1 s' ->
        case a1 of {
        (,) a _ ->
        let {
          sat :: Either Int Int =
              \u []
                  case a of {
                    False ->
                        case i of {
                        I# x ->
                        case +# [x 1#] of sat {
                        __DEFAULT -> let { sat :: Int = I#! [sat]; } in  Left [sat];
                        };
                        };
                    True -> Right [i];
                  };
        } in  (,) [sat s'];
        };};

Which creates an updateable closure (the \u []) a.k.a. a thunk. That thunk is put into the (,) that is used by the Writer. Of course this thunk will be forced immediately in the next generation, but I can’t imagine that’s good for performance.

Yes

You can enable it manually with -fstatic-argument-transformation (although for some reason it doesn’t seem to work in this case), but it is not always an optimisation. If the function is not inlined then it will a have to allocate a closure for that go function. @sgraf is working on improving it though. I believe the latest idea was to only apply this transformation if it makes it possible to inline the function.

Yes. Compare the above listings with the result of using the better foldM that uses the recursive helper function go (to get this result you have to either use one of the two $!'s, it doesn’t matter which, or you have to compile with -O2):

usesUpMemory_$s$wgo :: Int# -> [Char] -> (# Int, [Char] #) =
    \r [sc eta]
        case m1 eta of {
        (,) a1 s' ->
        case a1 of {
        (,) a _ ->
        case a of {
          False ->
              case +# [sc 1#] of sat {
              __DEFAULT -> usesUpMemory_$s$wgo sat s';
              };
          True -> let { sat :: Int = I#! [sc]; } in  (#,#) [sat s'];
        };
        };
        };

Now GHC can see the loop instead of only a single iteration and it can remove the Either completely and unbox the Int. Now there is only one allocation left, but that is in the base case so that only happens once.

Kind of, but I think in most cases the thunks are forced rather quickly and no leak occurs. So you’d get a lot of false positives. Edsko de Vries from Well-Typed has written the nothunks library which can give warnings if there are thunks in your code: Being lazy without getting bloated - Well-Typed: The Haskell Consultants.

And all thunks are really potential memory leaks, because if GHC could predict that a thunk is always forced quickly then it wouldn’t have to create the thunk in the first place.

Personally, I’m also excited by ghc-debug (also from Well-Typed) which is a run time memory analysis tool. That tool can be used to find actual memory leaks at run time without all the false positives.

atravers · January 20, 2024, 12:21pm

You mean there’s a substantial difference between using a local definition (go) and how loopM is currently defined?

Yes.

Then it’s a problem for GHC and its implementors - no-one else should have to go and rewrite all their existing code in some “special style” just because the implementation will produce the best result. Haskell is meant to be a declarative language…

And all thunks are really potential memory leaks, because if GHC could predict that a thunk is always forced quickly then it wouldn’t have to create the thunk in the first place.

…and all evaluations are really potential “time leaks”, because if GHC could reliably predict that an evaluation wasn’t needed then it wouldn’t need thunks at all.

Personally, I’m also excited by ghc-debug […]. That tool can be used to find actual memory leaks at run time without all the false positives.

Personally, I would be a lot more interested in the appearance of a heapless (or perhaps just GC-less) call-by-need implementation for GHC:

On Inter-deriving Small-step and Big-step Semantics: A Case Study for Storeless Call-by-need Evaluation (2011).

At least then (Glasgow) Haskell would be much closer to parity with Rust…

ajbarber · January 20, 2024, 12:49pm

I don’t know, it just seems too important to be relegated to some third party tool.

BurningWitness · January 20, 2024, 2:06pm

The fundamental rule is that a part of your programming process is reasoning about laziness. Every time you apply a pure operation to a piece of data, it is applied lazily and will only matter once that piece of data is demanded. You don’t have to be painstacking about it, unless you want the maximum bang for your buck, but if you do no work in this department you’re destined for situations like the one you described.

On a broader scale, try to think about the problems you solve through a functional lens. Instead of writing a sequence of operations that perform side effects in place, figure out what are the inputs in your system, what is the state, and what you need the output. You do not have to use monad cakes to solve problems, I would even argue they’re harmful because they promote imperative programming in a place that does not require it (and everything that does could instead use IO directly).

atravers · January 20, 2024, 11:06pm

On a broader scale, try to think about the problems you solve ~~through a functional lens~~ from a (mathematically) functional point of view.

by breaking the original problem down into sub-problems,
then repeating that process on those sub-problems,
- until those (sub-sub-sub-…) problems are simple enough to solve directly.

… Top Down Design (2017)

You do not have to use monad [layer] cakes to solve problems […]

…unless:

someone else gives you a serving of monadic “layer-cake” as a dependency,
and there’s no way to contain it (in the way runST can for monadic ST s a terms).

Then you can expect much of your code to also be passing along serves of monadic “layer-cake”. If you find that annoying…you are not alone:

Monad, I Love You; Now Get Out Of My Type System (2022)

I would even argue they’re harmful because they promote imperative programming in a place that does not require it (and everything that does could instead use IO directly).

…which most tend to do anyway, having no great interest in the “slicing” most effects-systems offer: ultimately such systems also serve monadic “layer-cakes” (with 2^N combinations, for N individual “layers” ).

ajbarber · January 21, 2024, 8:02am

Thank you, I see your point that it is a mindset, rather than a rule.

But consider this. Taking heavy inspiration from the Well Typed blog link above, say I am in an organisational setting. And say a PR lands on my desk with a small change where, in reviewing this change, I am guilty of not doing a thorough top down analysis of the problem, and approve the seemingly innocuous change, which passes CI, but which as you’ve no doubt guessed by now, leads to a horrible server crashing memory leak.

Do you think it is acceptable, that ghc provides no warning (albeit noisy), of this situation occuring?

We’re all about type safety, but I wonder have we paid enough attention to memory safety in ghc.

BurningWitness · January 21, 2024, 10:00am

Relative to how often this issue arises, I would say the status quo is fine. I would prefer seeing this problem solved from the ground up (being able to just say a certain piece of data is always strict), instead of bringing in a whole host of speculative tools.

I do not advocate for production-grade Haskell, unless the company knows what it’s doing. My answer to “what if someone makes a stupid change and it sets the production environment ablaze” is “just don’t use Haskell in production”. There has to be a better way to structure code that doesn’t require sledgehammer solutions.

And, of course, it’s not just a production problem, the entire ecosystem is like this. Libraries choose to hide laziness from the end user, so the only time you come across it is when you find out your little program takes 8Gb of memory to run, at which point your forum question yields a bunch of “that’s why I use this cool library to solve the language for me” answers.

tomjaguarpaw · January 21, 2024, 11:36am

Yes, that’s right.

nothunks was the inspiration for my article Make Invalid Laziness Unrepresentable. After I learned about that library I thought “after we’ve checked a value for thunks it would be nice to indicate that in the type, so that we know we don’t have to ever check it again” (I was imagining introducing a ThunkFree newtype wrapper). Then I realised that the simpler approach was to forbid thunks from occurring in the first place, through a better definition of the type in question!

Yes, that’s why you should make invalid laziness unrepresentable. Then you don’t have to rely on any tool. The impossibility of thunks just becomes part of your program.

I don’t think this situation is acceptable, but it’s not GHC’s fault. The author of loopM chose to allow thunks to build up in the state that is passed between iterations (similar to foldl). Why should GHC object to that? Maybe that’s necessary for some applications. If the author wanted different behaviour he should have implemented it differently. If the user wanted different behaviour she shouldn’t have used loopM. It’s not GHC’s job to decide! (This is why I have opened the discussion Why shouldn't I make my monads "value strict"?)

So what exactly do I find unacceptable? Our ecosystem has 100 laziness footguns, foldl, modifyIORef, Control.Monad.Trans.State.Strict.modify, all of Control.Monad.Trans.State (i.e. not .Strict), all of Data.Map.Lazy, … . It is so easy to create catastrophic space leaks using them that they should only be used by experts in very specific circumstances (generally to eke out performance). But we don’t do a very good job of educating users about this in general, nor of finding way of discouraging use of the footguns through other means.