The Haskell Unfolder Episode 31: nothunks

tomjaguarpaw · August 16, 2024, 8:25am

Regarding dealing with space leaks, my advice is the following:

Make invalid laziness unrepresentable. That is, design your types to be free of space leaks in the first place. In the same way you simply wouldn’t use strings "TRUE" and "FALSE" to represent booleans, don’t use data MyPair = Pair Int Int to represent a pair of fixed-precision integers. When evaluated it’s not a pair of evaluated fixed-precision integers! It’s a pair of (either a fixed-precision integer or thunk (potential space leak)). Instead, use data MyPair = Pair !Int !Int.

Similarly, don’t use data MyPair2 = Pair !Int !(Maybe Int). There’s a thunk (potential space leak) hiding in that Maybe. Instead use data MyPair2 = Pair !Int !(Strict (Maybe Int)). (See the strict-wrapper library.)
Use th-deepstrict to confirm that the data types that you are defining don’t hide space leaks.
Only use the space-leak-free versions of various library functions. This is a bit more awkward, because you have to know which to avoid. For example, you should only ever use foldl' not foldl, Data.IORef.modifyIORef' not Data.IORef.modifyIORef, and Control.Monad.Trans.State.modify' not Control.Monad.Trans.State.modify.

(Maybe one day this knowledge will be encoded into stan or some other static analyser, so everyone doesn’t have to just remember it.)
If you come across a space leak nonethless, use GHC’s heap profiler with retainer profiling. That should give you a good idea of which data type the space leak occurs in. Then, if it’s your data type, you can go back to 1 to fix it, perhaps using nothunks to help diagnose. Once fixed use th-deepstrict to ensure that the data type doesn’t regress. On the other hand, if the space leak is in a library you’re using then it’s more tricky. I guess file a bug report upstream, for example my patch to megaparsec.

I don’t think I really follow this. It’s not a question of “functions benefitting from correct types”. It’s a question of enforcing invariants on your data types (as @kosmikus explains in the linked video). If there’s no need for laziness in your data type then enforce its absence by making invalid laziness unrepresentable and it will be space leak free! The point of making invalid laziness unrepresentable is that deepseq becomes simple the same as seq. There is no longer and deep laziness to seq! deepseq is a massive anti-pattern. If you find yourself using it then something has likely gone terribly wrong. (For a discussion around the boundary between legitimate deepseq use and anti-pattern use, see Deepseq versus "make invalid laziness unrepresentable").